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OPERATOR PREFERENCES FOR MOVEMENT 
COMPATIBILITY BETWEEN RADAR HAND 
CONTROL AND DISPLAY SYMBOLOGY’ 


CHARLES S. MORRILL 


and LINDA T. SPRAGUE 


Radio Corporation of America, Burlington Laboratory 


In the design of the control and display 
portions of airborne radar systems questions 
always arise regarding the compatibility of 
the manual movements a hand control 
with the corresponding movements of the dis- 
play symbols (Ely, Thomson, & Orlansky, 
1956). This study attempts to answer these 
questions as they relate to an airborne radar 
system to be used for air-to-air tracking in 
the manual mode. The preferences with which 
this study is concerned involve the relation- 
ship between the actual movements of the 
radar antenna (as activated by the hand con- 
trol) and the visual representation of these 
movements on the radar display. 


of 


In setting up a model display-control sys- 
tem the authors selected the following char- 
acteristics: (a) the signal source for the dis- 
play symbology was a dual beam scope; (0) 
three parameters of radar antenna informa- 
tion were presented on a_ two-dimensional 
surface; (c) antenna azimuth and range were 
represented by a single symbol generated by 
one channel of the dual beam scope and ca- 
pable of moving along x and y axes simul- 
taneously; (d) antenna elevation was repre- 
sented by another symbol generated by the 
second scope channel and capable of vertical 
movement along the right-hand strip of the 
square display scope surface. The symbols on 
the display will be referred to as azimuth- 
range symbol and elevation symbol. 


assist - 


the 
design and construc 


1The authors gratefully acknowledge 
ance of Ross Whistler for the 
tion of the hand control. 

2 Both authors are presently employed by 
MITRE Corporation, Bedford, Massachusetts. 


The 


1 


2 
3 
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The hand control exhibited the following 
characteristics: (a) the control was a hand- 
grip (to fit an average size man’s gloved 
hand) with a serrated wheel mounted at the 
top in a position readily reached by the 
thumb; (4) pivotal rotation of the handgrip 
left and right produced movement of the 
azimuth-range symbol in azimuth; (c) piv- 
otal rotation forward and backward produced 
movement of the azimuth-range symbol in 
range; (d) forward and backward rotation of 
the thumbwheel produced movement of the 
elevation symbol. 

A current recommendation (Dunlap & As- 
sociates, 1957) for hand control design is 
based on the following control-display rela- 
tionships which are considered optimum or 
most “natural”: (a) left and right rotation 
of the hand control would produce, respec- 
tively, changes to the left and right in azimuth 
of both the antenna and the azimuth-range 
symbol on the display; (4) forward and 
backward rotation of the hand control would 
produce, respectively, increasing and decreas- 
ing antenna range. Antenna range increase 
and decrease would be represented on the 
display, respectively, by upward and down- 
ward movement; (c) forward and backward 
rotation of the thumbwheel would produce, 
respectively, changes downward and upward 
of both the antenna and the elevation symbol 
on the display. When the three parameters of 
antenna information are displayed and con- 
trolled in the manner just described there re- 
sults an incompatibility in direction of dis- 
played motion on the display surface between 
the azimuth-range symbol and the elevation 
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symbol, i.e., a forward motion of the hand 
control results in an upward motion of the 
azimuth-range symbol, while a similar for- 
ward motion of the thumbwheel results in a 
downward motion of the elevation symbol. 


PROBLEM AREAS 


Historically, there are two distinct ways of 
conceptualizing relationships between aircraft 
controls and aircraft display symbology. In 
one conceptualization the display presents a 
representation of the functional operation of 
the aircraft (equipment), in this case the 
movement of the antenna. In the other con- 
ceptualization the display is a direct repre- 
sentation of the hand control movements 
themselves with no reference to actual an- 
tenna movements. In the recommended hand 
control configuration the relationships be- 
tween control movement and resultant dis- 
play symbology movement are based on the 
former conceptualization, i.e., it is assumed 
that the operator views his display as a rep- 
resentation of antenna movement. The result- 
ant movement of display symbology exhibits 
the incompatibility previously described. 

he authors hypothesized that hand con- 
trol design based on control-display relation- 
ships which were entirely compatible and 
which ignored the functional operation of the 
antenna would result in fewer control re- 
versals in times of stress or fatigue. There- 
fore, this study was initiated to determine 
whether: (a) the radar display is viewed as 
a functional representation of antenna move- 
ment or as a direct representation of the hand 
control movements; (/) those who prefer the 
direct representation also indicate a prefer- 
ence concerning the direction of movement of 
the azimuth-rznge symbol to make it com- 
patible with the movement of the elevation 
symbol. 


SAMPLE 


Three groups of subjects were surveyed. 
One was a group of technical and nontech- 
nical personnel from the RCA Burlington 
Laboratory with prior aircraft experience, the 
second a group of technical and nontechnical 
personnel from the same laboratory who did 
not have aircraft experience. The third group 
was comprised of current F-94 and F-89 
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radar operators from Otis Air Force Base. For 
the purpose of the analysis these groups were 
considered as follows: 1. Group A: 22 Ss 
from Otis Air Force Base, all on active duty 
as radar operators (experience ranges from 
60 to 1200 hours airborne scope time); 2. 
Group B: 20 Ss from the Burlington Labo- 
ratory with no aircraft experience either as 
pilots or radar operators; 3. Group C: 20 Ss 
from the Burlington Laboratory with prior 
aircraft experience either as pilots or radar 
operators (experience ranges from 4 to 1200 
airborne hours); 4. Group D: Groups B and 
C together (V = 40). 


METHOD 


Personnel from the Burlington Laboratory were 
seated before an experimental console containing a 
radar operator’s scope and a prototype of the recom- 
mended hand control mounted on an arm rest to 
the right. In order to survey the Otis personnel the 
hand control was presented but the 
illustrated by a drawing 

A static display 
on the scope 
presented by 


display was 
as shown in Fig. 1 was presented 
Target azimuth-range information was 
a dot on the central portion of the 
scope; target elevation information by a dot on the 
right-hand strip of the scope. The azimuth-range 
symbol on the central portion of the scope was rep- 


resented by a small circle; the elevation symbol by 





Operator Preferences 


TABLE 1 


CONTROL-DISPLAY COMPATIBILITY 


Group Subjects 


A Otis AFB Radar 
Operators 
B RCA—No Aircraft 
Experience 
i RCA 
Experience 
D RCA—With and Without 
(Composite of 
Groups B & C) 


Aircraft 


Aircraft Experience 


a short horizontal line on the right-hand strip of 
the scope. A sketch of the hand control appears in 
Fig. 2 

After receiving a brief explanation of the meaning 
of the display symbology and minimal information 
concerning the hand control’s capabilities, all Ss were 
asked the following questions: 


- as weal 
1. If the target appeared here pt] (the spot 


was indicated with a stylus) on the scope, would 
you the hand control left or right to move 
the hand control blip (azimuth-range symbol) over 
the target dot? 


move 


‘oe, ame | 
2. If the target appeared here }++—4| on the 


EE 


scope, would you move the hand control forward or 


Prefer Prefer Non- 
Compatible compatible 
Control- Control- 
Display Display Level of 
Relation- Relation- Confidence 
ship ship roe 


19 3 


18 


backward to move the hand control 
range symbol) over the target dot? 


3. If the target appeared here FHF) on the 


scope, would you move the thumbwheel forward or 
backward to move the hand control bliy 
symbol) over the target dot? 

4. Do you visualize this display as a forward- 
looking (a representation of hand control move- 
ments) or as a downward-looking (a representation 
of the actual antenna movements) display ? 


blip (azimuth- 


(elevation 


In asking these questions, the term “hand control 
blip” was used consistently to refer to the display 
symbols. It had been explained initially as 
trolled by the hand control,” in order not to struc- 
ture the situation. Every attempt was made to avoid 


“con- 


TABLE 2 


PREFERENCI 


Group Subjects 

Otis AFB Radar 
Operators 

RCA—No Aircraft 
Experience 

RCA—Aircraft 
Experience 

D RCA—With and Without 

(Composite of 

Groups B & C) 


Aircraft Experience 


FOR DIRECTION OF MOVEMENT 


Hand 
Control— 
Forward 
Display 
Symbols— 
Upward 


Hand 
Control— 
Backward 

Display 
Symbols— Total 


Upward N 


Level of 
Confidence 


17 19 
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influencing the Ss’ conceptualization of the display 
by the choice of terminology used in asking the 
questions 


RESULTS 

1. Preferences for compatible versus non- 
compatible displays are indicated in Table 1 
together with their associated chi square con- 
fidence levels. 2. In Table 2 are shown, for 
those Ss who preferred the compatible dis- 
play (Table 1), the preference in direction 
of motion of hand control to effect the corre- 
sponding symbol movements on the scope. 
The associated chi square confidence levels 


are here also indicated. 3. Of the entire sam-' 


ple (N = 62) only one S indicated a pref- 
erence for an inverse azimuthal relationship 
between hand control motion and display 
symbol movement. This S, however, had had 
extensive experience with ship tillers where 
the inverse relationship usually obtains. 


CONCLUSIONS 


The results of this survey indicate that all 
groups in this study preferred a compatible 
display-control relationship. Both Otis radar 
operators and RCA personnel indicated sta- 
tistically significant preferences for the dis- 
play which represents directly the hand con- 
trol movements, rather than the display which 
represents the functional operation of the an- 
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tenna. All chi squares related to this problem 
were statistically significant beyond the .05 
level of confidence. 

Only the Otis radar operators showed a 
strong preference for executing a backward 
motion of both the hand control and the 
thumbwheel to effect an upward movement 
of the display symbols. This preference was 
statistically significant beyond the .001 level 
of confidence. The radar operators consider 
the hand control to operate in a manner simi- 
lar to the pilot’s control stick, i.e., a back- 
ward motion of the control produces an up- 
ward movement of the display symbol. 

Among the personnel of the Burlington 
Laboratory there does not appear, from this 
study, to be any statistically significant pref- 
erence regarding the relationship between a 
backward motion of the hand control and 
thumbwheel, and the corresponding move- 
ments of the display symbols, either upward 
or downward. 


REFERENCES 
E.y, J. H., THomson, R. M., & Ortansky, J. Lay- 
out of workplaces. Wright Air Development Cen- 
ter, September 1956. Tech. Rep. 56-171; AD 
110507. 
Dun ap anv Assocrates. Design of hand control for 
NAV/A1. Unpublished confidential report, 1957. 


(Early publication received September 4, 1959) 





Journal of Applied Psychology 
1960, Vol. 44, No. 3, 141-145 


A TABLE 


HAROLD A. 


FOR COMPUTING THE PHI COEFFICIENT 


EDGERTON 


Richardson, Bellows, Henry & Company 


When faced with a research job which en- 
tailed the computation of at least 1,770 phi 
coefficients, some way to reduce the work was 
sought. The answer was the accompanying 
table. 

The yalue of the phi coefficient can be found 
from the table with its accompanying nomo- 
graph when three items of information are 
known. three items are the marginal 
relative frequency /,, the per cent “rights” for 
the first variable (x), and 
“rights” on the second variabl« (y 


These 


pe, the per cent 
and Pie, 
the relative frequency of the intersected cell, 


showing the per cent “rights” in both variables. 


The procedure is as follows: 


1. From p; and pz, compute S = Vpipe using 
the nomograph. 
Enter the S = VPiPpe2 column of the table 
following down to the row identified by 
the value of pie. The table entry in this 
cell is the sought value of ¢. 


For example: The correlation between two 
test items, Numbers 13 and 17, may be ex- 
The observed data 
are shown in the accompanying diagram. It is 
noted that p, (rights for Item 17) is 120 = 60%, 
pe (rights for Item 13) is 100 = 50%, and py 
(rights on both items) is 80 = 40%. 


pressed as a phi coefficient 


V60 X 50 
is obtained from the nomograph. 


1. The value of Vpipe 


2. Entering the table in Column S 


55° 


Vp 


0, go down to Row py 40%. 


Item 13 


Not Right 


40 


The 
¢ = 
The table was organized as follows: 
The formula for the fourfold point correlation 


table entry at that intersection is 


.39, the value sought. 


coefficient, ¢, may be written 
ne — Pir . 
$ I I m1 
Vpi — p2 Vp2 — p?? 
This function may be simplified’ by letting 
Pi = Pr Vpipe = S. Then 
Pe — S* 
¢ 


S(1 — S) 


1 This substitution was suggested by M. W. Richard 
son. 


Right 


SO 120 = 60°, = 
2 = 40% 


20 


100 200 = N 


The actual computation of the table was ac- 
ro) 


12] 
J 


complished by setting up the function 
taking 


as 
two fractions and the differences as 
indicated. 


The nomograph and table were constructed 


to be used for values of pi, pe and Vpipe from 


.20 to .80. Within this rarige, and where p; and 
P2 are not greatly different, the table yields 
values of ¢ which are sufficiently precise for the 
purposes for which phi coefficients are com- 
monly used. 


Early publication received December 21, 1959 
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NOMOGRAPH FOR COMPUTING S$ = vp, P> 
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AN ANALYSIS OF PILOT FLYING PERFORMANCE IN 
TERMS OF COMPONENT ABILITIES* 


EDWIN A. FLEISHMAN 


Yale University 


AND GEORGE N. ORNSTEIN 


North American Aviation, Inc., Columbus, Ohio 


The job of flying an airplane involves one 
of the most complex perceptual-motor tasks 
found in practice. In response to a continu- 
ally changing set of cues, the pilot must ma- 
nipulate many diverse controls in order to ac- 
complish a specific flight path or set of flight 
conditions. He achieves this by controlling 
the movement of the aircraft along and about 
the vehicle’s three axes. In addition, the pilot 
must monitor and time-share a number of dis- 
plays, must shift attention frequently, and 
must actively schedule his future activities. 
All of these tasks are performed under an es- 
sentially forced-pace condition, since he can- 
not stop and must, in fact, maintain a certain 
speed or the aircraft will stall. 

During World War II and the early post- 
war period, aviation psychologists were highly 
successful in developing procedures for se- 
lecting people for this complex job. The va- 
lidity achieved through the use of objective 
selection tests constitutes one of the major 
practical accomplishments of psychological 
methods (Flanagan, 1947; Fleishman, f953, 
1956; Guilford, 1947; Melton, 1947). The 
criterion of pilot success, which these tests 
were designed to predict, was whether the 
pilot trainee passed or failed during the first 
six months of his training. The limitations of 
the pass-fail criterion, as a measure of pilot 
performance, were recognized. Consequently, 
efforts were made to obtain more analytical 

1 This research was carried out while the authors 
were with the Air Force Personnel and Training Re- 
search Center. The work was done under ARDC 
Project No. 7710 in support of the research and de- 
velopment program of the Air Force Personnel and 
Training Research Center, Lackland Air Force Base, 
Texas. Permission is granted for reproduction, trans- 
lation, publication, use and disposal in whole and in 
part by or for the United States Government. 

The authors are indebted to Ralph E. Flexman 
for his invaluable support and many technical con- 
tributions during the conduct of the study. 


information on the nature of pilot proficiency 
as a basis for developing objective measures 
of flying performance. Initial efforts to ob- 
tain more analytical information were based 
on the analysis of instructor ratings of pilot 
performance. The subjective nature of these 
data often resulted in low reliability or in 
“halo effect” which made meaningful analy- 
ses difficult (Ben-Avi, 1947; Kelly, 1943). A 
thorough review of developments in measur- 
ing pilot performance up to 1952 has been 
presented by Ericksen (1952a). In the late 
postwar program considerable progress was 
made in developing more objective flying per- 
formance measures in connection with pilot 
selection and training studies in the Air Force 
research program (Boyle & Hagin, 1953; 
Flexman, Townsend, & Ornstein, 1954; Orn- 
stein, Nichols, & Flexman, 1954; Sutter, 
Townsend, & Ornstein, 1954). It is from this 
series of studies that a practical and reliable 
in-flight performance measure has emerged. 


PROBLEM 


The present study is concerned with a fac- 
tor analysis of performance in different fly- 
ing maneuvers. The attempt is to specify the 
variance in common between maneuvers which 
may provide insight into the dimensions of 
individual differences in this complex task. 
Essentially this study represents a converg- 
ence of two lines of research. One involves 
the development of analytical objective meas- 
ures of pilot performance (Boyle & Hagin, 
1953; Ericksen, 1952b; Ornstein, Flexman, & 
Nichols, 1954; Smith, Flexman, & Houston, 
1952: Sutter, Townsend, & Ornstein, 1954). 
The other line of research is represented by 
laboratory studies of experimental tasks con- 
cerned with the isolation of generalizable di- 
mensions of skilled performance (Fleishman, 
1953, 1954, 1956, 1957, 1958a, 1958b, 1959; 
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Analysis of Pilot Flying Performance 


Fleishman & Hempel, 1954a, 1954b, 1955, 
1956; Hempel & Fleishman, 1955; Parker & 
Fleishman, 1959). 


METHOD 
Subjects 


The Ss were 63 graduates of a special Primary 
Pilot Training Program at Goodfellow Air Force 
Base, Texas. The Pilot Aptitude Scores for these Ss 
were distributed normally between 3 and 9 on the 
1 to 9 stanine scale of the Aircrew Classification 
Battery. The hypothesis that there was no difference 
between the distribution found in this sample, and 
a normal distribution of the same mean and variance 
could not be rejected at the 5% level 


Measurement of Performance in the Air 


The performance of these pilots was measured in 
the T-6 aircraft which was the operational train- 
ing aircraft in use at the time. Daily recordings of 
student performance were made by each student’s 
instructor on forms called Daily Progress Record 
Sheets (DPRS). A complete description of the ra- 
tionale, development, and characteristics of the meas- 
uring device may be found elsewhere (Smith, Flex- 
man, & Houston, 1952; Sutter, Townsend, & Ornstein, 
1954). 

There is a separate DPRS for each maneuver. 
Each contains items which were determined by ex- 
tensive analysis of the actual performances required 
in that maneuver (Houston, Smith, & Flexman, 1954) 
Each item was designed so that it could be recorded 
categorically as “correct” or “incorrect.” The sum of 
the incorrect items within a maneuver was taken as 
a maneuver error score. Table 1 presents examples 
of the items recorded for the maneuver: Power-On 
Stall. The instructor indicates success or failure of 
each item with a V or X mark in the box to the 
right. 

In the present analysis a maneuver score for a 
given S is the sum of the first four recorded trials 
for that S flying that maneuver. For any given ma- 
neuver only the first performance (trial) of that ma- 
neuver was recorded during any flight, and no in- 
struction on that maneuver was given until after the 
first performance of the maneuver during that flight. 
Thus, the four trials summed for a given maneuver 
were recorded on different (successive) flights. 

An estimate of the reliability of performance on 
each maneuver was obtained as follows. First, the 
test-retest (i.e., flight-flight) intercorrelation was de- 
termined for each successive pair of flights; next, the 
arithmetic average of these intercorrelations was ob- 
tained; and, finally, the average single-ride reliability 
was adjusted by the Spearman-Brown “prophecy- 
formula” so as to correspond to a test of four times 
the length of the single trial. While these coefficients 
cannot be interpreted in traditional reliability terms, 
they do provide a conservative estimate of maneuver 
reliability. 
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TABLE 1 
EXAMPLE OF ITEMS IN THE DAILY PERFORMANCE 
RECORD FOR THE MANEUVER: 
Power-On STALI 
Entry 
Gyros Caged. . 
Looks 
Two Clearing Turns 
Direction (+5°).. 
Torque. 
Pitch Proper 


Recovery 
Direction (+5°).. 
Recovery at Stall... 
Stick & Throttle Together 
Throttle to Sea-Level Stop 
Aileron Usage 
Torque Correction 
Pitch Control Proper 
M.P. Reduced to 25”. 


Description of Maneuvers 


The present analysis is based upon the scores for 
24 maneuvers selected from the 33 nonacrobatic ma- 
neuvers included in the syllabus of flying instruction 
for this training program. The nine maneuvers ex- 
cluded from this analysis were eliminated on the ba- 
sis of a joint consideration of low reliability, high 
difficulty, and similarity to other maneuvers included. 
The 24 remaining maneuvers were considered repre- 
sentative of the original group of maneuvers. These 
maneuvers all involve “contact flying”; that is cues 
outside the cockpit were available. Parallel maneuvers 
in which the S had to rely only on his instruments 
are not included. 

Brief descriptions follow of the 24 maneuvers in- 
cluded in the present analvsis. Also included is a reli- 
ability estimate for each maneuver. 

1. Straight and level: The pilot is required to 
maintain a specified altitude and heading. When de- 
viations occur he makes small and frequent correc- 
tions in bank and small but temporally more exten- 
sive corrections in pitch. (Reliability = .47) 

2. 90° Climbing turn: The pilot establishes and 
maintains a specified bank, air speed, rate of turn, 
and power setting until the appropriate recovery 
point. A highly controlled blending of stick and rud- 
der pressures is required continuously throughout the 
maneuver. (r = .60) 

3. Level-off from climbing turn: The pilot at- 
tempts to achieve a level flight attitude as he reaches 
a specified altitude and heading. An anticipatory 
response is required utilizing coordinated elevator, 
aileron, and rudder pressures. This coordination may 
or may not be simultaneous in all three dimensions. 
(r = .52) 
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4. Gliding turn: The pilot executes several pre- 
paratory procedural items and establishes certain spe- 
cific flight conditions (including reduction of power) 
prior to initiating the bank and turn. This maneuver 
requires the proper anticipation of control pressure 
changes and a threefold coordination of elevator, 
aileron, and rudder. Relatively gross movements are 
used compared to those used during power-on ma- 
neuvers. (r = .58) 

5. Level-off from gliding turn: The pilot attempts 
to achieve a level flight attitude as he reaches a 
specified altitude and heading. Considerable coordi- 
nation in the alternative manipulation of throttle, 
elevator, rudder, and aileron controls is required 
(ry = .57) 

6. Take-off: The pilot is required to maintain a 
specific track and establish a proper climb attitude 
while accomplishing a large number of procedural 
items. Rapid and small, sensitive, rudder corrections, 
and the application of continuously changing ele- 
vator pressures are called for. (r= .34) 

7. Coordination exercise: The pilot is required to 
make several consecutive turns during which he 
maintains a specified bank, turns a given number of 
degrees, and holds his entry altitude. Continuous and 
precise coordination and timing of elevator, rudder, 
and aileron pressures are required. (r = .28) 

8. Straight and level gear check: The pilot is to 
maintain a specified heading and altitude while ac- 
complishing a large number of procedure-type ac- 
tions. Considerable sharing of attention is required 
as well as the ability to anticipate the resultant ef- 
fect of the procedures upon aircraft performance. 
Frequent elevator corrections are normally needed 
to keep the aircraft stable. (r = .48) 

9. Traffic pattern at auxiliary field: The pilot is 
required to execute a large number of procedural 
items while flying a predetermined pattern over ine 
ground. Specific procedures must be accomplished in 
a given sequence and with timing such that the pre- 
determined flight path is accomplished. (r: no esti 
mate) 

10. Rectangular pattern: This maneuver is similar 
to Maneuver 9—differing primarily with respect to 
the magnitude of the planning requirements. Here, 
the pilot, in addition to the requirements of Ma- 
neuver 9, must locate some field upon which he may 
land the aircraft safely, and must fly the maneuver 
under less familiar conditions. (r: no estimate) 

11. Three-point landing: This is one of the most 
difficult maneuvers to learn. Proper performance re- 
quires precise timing of changes in pitch attitude, 
the planning of ground track and position, and, at 
times, an unusual correction of aileron and rudder 
to account for wind. Both fine and relatively large 
and abrupt control movements may be required on 
all controls. (r = .39) 

12. Climbing turn from level: The pilot estab- 
lishes proper pitch and bank attitude by integrating 
application of power with coordination of elevator, 
rudder, and aileron. (r = .62) 

13. Landing characteristic stall: This maneuver in- 
tegrates Maneuvers 4 and 11. It differs from 11 in 
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that no ground path must be maintained and less 
“timing” is required when changing pitch. The re- 
covery from the stall requires rather abrupt, but not 
overcontrolled, use of controls and; hence, implies 
only crude coordination requirements. (r = .62) 

14. Power-on stall: The pilot is required to estab- 
lish and maintain a specific pitch attitude until a 
stall occurs. He is then to effect an immediate re- 
covery. Specific but changing pressures must be ap- 
plied to the various controls until the stall occurs. 
Then, brisk throttle and control movements must be 
executed in a coordinated but mechanical fashion. 
(ry = .§52) 

15. Approach to stall: The pilot is required to 
perform as in Maneuver 14 except that he does not 
quite permit the aircraft to stall. Throughout the 
maneuver fine, sensitive, control pressures are ap- 
plied and at no time are the brisk large movements 
used. (r = .60) 

16. Power-off stall: This maneuver is performed 
from a normal glide. The pilot establishes and main- 
tains a specific pitch attitude until the stall occurs. 
The maneuver emphasizes the from the 
stall—a recovery wherein rather gross control move- 
ments are used. (r 72) 

17. Steep turn (360°): The pilot is required to es- 
tablish and maintain a steep bank, to maintain a 
specific altitude, and to time his recovery so as to 
turn exactly 360 
changes must be made during the maneuver, as well 
as fine aileron and rudder movements. (r = .52) 

18. Maximum performance climbing turn: The 
pilot is required to initiate a steep climbing turn 
from level flight. No requirement exists for either 
specific bank, altitude, or degree of turn. This ma 
neuver is a preparatory exercise for a more advanced 
maneuver. Coordination and timing is not empha- 
sized. (r = 31) 

19. Spin: The pilot is required to stall the aircraft 
from a power-off flight condition. He then must 
abruptly apply rudder movement in order to cause 
the spin. During the spin the pilot must maintain 
his orientation with respect to the earth so as to 
properly time the application of a sequence of abrupt 
control movements designed to effect recovery 
the spin. (r = .45) 

20. Rudder control stall: The pilot is required to 
initiate the maneuver as in Maneuver 14. However, 
after the stall ocurs he keeps the aircraft in the 
stalled condition, wings level, until a level flight atti- 
tude is attained. He then recovers. Critical to good 
performance is the extremely fine rudder control used 
to keep wings level. (r = .65) 

21. Slow-flight turn: The pilot initiates the ma- 
neuver from straight and level slow flight. Maintain- 
ing altitude, he enters and holds a shallow bank and 
turns a specified number of degrees. The rudder is 
the primary control in correcting for the high torque 
condition. Small rudder, aileron, and elevator pres- 
sure coordinations are required throughout. (r = .63) 

22. Slow-flight recovery: The pilot returns from 
straight and level slow flight to a normal cruise con- 
dition while maintaining both altitude and direction. 


recovery 


Extremely fine elevator pressure 


from 





Performance 


"lying 


f Pilot F 


Sis O 


Analy 


» SUFTANANVIN 


ONOKYV SNOLLVTSAXNOOMALNT 


t ATAVI 


B saoRd OM} 0} pepunoy « 


LJ QWIOPY ye 19}, eg Ije1y 

SUIPUR'T par104 

{19A099%] WYST[-MOTS 

UINT, JUST -MOTS 

BIS [O1]UOD JOppny 

urdg 

uiny, Surqui JOJIIG WNUWIXR]T 


(,Q99¢) uany, daazs 


[WIS HO-49M0d 

[BIS 0) yovoiddy 

[BIS UD-IIMOg 

[27S ONsua}eIeY) Surpuey] 


[aAd] Woly UINy, Surquity) 
BUIPUL’] JULO Geos] 

Ul9d}JVg IVjNSuvW ay 

atq Areyixny ye usayjeg Ie] 
yo) IBIt) [VAa‘T pt Tv WSIBIIS 
QSINIAX7] UOTPBUIPsOO) 
JO-P4xeL 

ulin] Ips) Wooly YO-[PAIT 
uIny, Surpysy 

UIN]T, Surquiyy Wooly JO-}PAIT] 
uINT, SuIquty) .06 

[AAI] Pp WYSTeIIS 


JIANIBURIN 





Edwin A. Fleishman and George N. Ornstein 


TABLE 3 


CENTROID FACTOR LOADINGs * 


Maneuver 
. Straight and Level 
. 90° Climbing Turn 
. Level-Off from Climbing Turn 
. Gliding Turn 
. Level-Off from Gliding Turn 
. Take-Off 
. Coordination Exercise 
. Straight and Level Gear Check 
. Traffic Pattern at Auxiliary Field 
Rectangular Pattern 
. Three-Point Landing 
. Climbing Turn from Level 
. Landing Characteristic Stall 
4. Power-On Stall 
. Approach to Stall 
. Power-Off Stall 
. Steep Turn (360°) 
8. Maximum Performance Climbing Turn 
. Spin 
Rudder Control Stall 
. Slow-Flight Turn 
. Slow-Flight Recovery 
. Forced Landing 
. Traffic Pattern at Home Field 


® Rounded to two places with decimals omitted. 


Primary coordination is between the throttle and 
rudder. Gradual changes in elevator pressure are re- 
quired as airspeed builds up. (r = .61) 

23. Forced landing: From any attitude and loca- 
tion the pilot is required to select an emergency 
landing area, to accomplish certain procedures and 
to plan and accomplish a flight pattern that will en- 
able him to land in the selected area. The primary 
requirement is the effective planning and accomplish- 
ing of procedures while maintaining control of air- 
craft and orientation with respect to the ground. 
(r = .56) 

24. Traffic pattern at home field: This maneuver 
is very similar to Maneuvers 9 and 10 in that the 
student must attend and respond to cues outside of 
the aircraft in coordinating and choosing his control 
movements. more fa- 


However, the here are 


miliar. (r = .75) 


cues 


Data Analysis Procedures 


The among maneuver 
were obtained and are presented in Table 1. Table 2 
presents the centroid factors extracted by the Thur- 
stone Method (Thurstone, 1947). Orthogonal rota- 


correlations these scores 


Factors 


tions to simple structure were made “blind” by an 
analytical procedure programmed for an IBM 650 
Table 3 presents the rotated factor matrix 


RESULTS 


The factor interpretations follow. We have 
listed loadings above .30. 


Factor Interpretations 


Factor I is best measured by those ma- 
neuvers which place a premium on highly 
controlled, but not overcontrolled, muscular 
movements. Many of these maneuvers em- 
phasize a sensitive touch on the rudder con- 
trols, but hand-arm control movements are 
also involved. This factor seems highly simi- 
lar to one previously identified as general 
to a variety of psychomotor tests emphasiz- 
ing highly controlled movements (Fleishman, 
1957a, 1958b; Fleishman & Hempel, 1956). 
Originally this factor was called Psychomotor 
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Variable 


No. 
1, 


20 


/ 


21 


Maneuver 
Landing characteristic stall 
90° climbing turn 
Take-off 
Steep turn (360°) 
Approach to stall 
Three-point landing 
Rudder control stall 
Coordination exercise 
Slow flight turn 


Loading 
.68 
608 
4 
A7 


45 


42 


Al 
Al 
31 


Coordination I or Fine Control Sensitivity. 
As the nature of this factor became better 
understood through subsequent research, the 


name 
more 


Control Precision was 
appropriately descriptive 


Fleishman, 1959). 
Factor II seems best defined by maneuvers 


which 


emphasize Spatial Orientation 


introduced 
(Parker 


as 


& 


(see 


Michael, Guilford, Fruchter, & Zimmerman, 


1 
? 


3 
4. 
5 


Maneuver 


Straight and Level 
90 


Level-Off from Climbing Turn 


Climbing Turn 


Gliding Turn 
Level-Off from Gliding Turn 


Take-Off 


. Coordination Exercise 


Straight and Level Gear Check 
Traffic Pattern at Auxiliary Field 
Rectangular Pattern 

. Three-Point Landing 

. Climbing Turn from Level 
Landing Characteristic Stall 
Power-On Stall 
Approach to Stall 

. Power-Off Stall 


. Steep Turn (360°) 


§. Maximum Performance Climbing Turn 


* Rounded to two places with decimal 
> Factors are interpreted as I, Control Precision; II, Spatial Orientation; III, Multilimb Coordination; IV 


. Spin 


. Rudder Control Stall 
. Slow-Flight Turn 


. Slow-Flight Recovery 
. Forced Landing 
. Traffic Pattern at Home Field 


omitted 
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Variable 
No. 
23 


3 Level-off from climbing turn 


Maneuver Loading 


Forced landing 


Slow flight recovery 

Straight and level gear check 
Traffic pattern at auxilliary field 
Rectangular pattern 

90° climbing turn 

Traffic pattern at home field 
Gliding turn 

Rudder control stall 


1957). Judgments about one’s location in 
three-dimensional space seem especially criti- 
cal in maneuvers such as Forced Landings, 
Flying Traffic Patterns, Climbs, and Turns. 
A number of the maneuvers loading on this 
factor also emphasize knowledge and integra- 
tion of rules and procedures, but the Spatial 
aspect seems the more general feature. 


TABLE 4 


ROTATED Factor LOADINGS *® 


05 
45 
27 
47 
01 
22 
41 
31 
26 
23 
—(1 


tion; V, Rate Control; and VI, Kinesthetic Discrimination. 


Factors 


MLC RO RC 


38 $2 
21 
16 
05 
48 
13 11 

23 07 
02 34 07 
49 31 

45 22 
18 18 
23 33 
24 38 
29 33 43 
25 12 51 
43 16 61 
09 46 16 
37 —17 05 
24 43 43 
21 12 40 
41 16 30 
41 —02 27 
06 —11 37 
48 17 18 


41 
—O01 
22 


17 


32 
46 
24 


7? 


13 
31 
47 
53 
39 


, Response Orienta- 
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Factor III includes maneuvers which em- 
phasize the coordinated use of multiple limbs: 
two hands, two feet, or combinations of feet 
and hands. This corresponds to the factor 
called Multilimb Coordination in certain labo- 
ratory research (Fleishman, 1958b; Parker & 


Variable 
No. Maneuver Loading 
Coordination exercise .60 
Climbing turn from level 56 
Gliding turn 51 
Take-off 
Forced landing 49 
Power-on stall 45 
Traffic pattern at home field 43 
Level-off from gliding turn 42 
Level-off from climbing turn 39 
Straight and level 38 
Slow flight turn 37 
Slow flight recovery 36 
Approach to stall 34 
Power-ofi stall ae 
Fleishman, 1959). In one study it was called 
Psychomotor Coordination II (Fleishman & 
Hempel, 1956). This factor and Factor I ap- 
parently are components of the Psychomotor 
Coordination factor found valid for pilot se- 
lection during World War II (Fleishman, 
1953; Guilford, 1947). Both of these com- 
ponents have been found valid in subse- 
quent studies (Fleishman, 1956b; Fleishman 
& Hempel, 1956). 

Factor IV contains many of the same ma- 
neuvers as those loading on Factor II. For 
example, the traffic pattern maneuvers ap- 
pear prominent. A tentative interpretation 
is that this factor corresponds to the Re- 
sponse Orientation factor previously identi- 
fied (Fleishman, 1956b, 1957a, 1957b, 1958b; 
Fleishman & Hempel, 1956; Parker & Fleish- 
man, 1959). The essential feature of this fac- 
tor is the ability to make rapid response de- 


Variable 


No. Maneuver Loading 


9 Traffic pattern at auxilliary field 49 


24 Traffic pattern at home field 48 


5 Level-off from gliding turn 48 
10 Rectangular pattern 45 
Power-off stall A3 
Straight and level 42 
Slow-flight turn Al 
Slow-flight recovery Al 


cisions under rapidly changing stimulus con- 
ditions. The rapid selection of controls and 
their proper directional manipulation in re- 
sponse to cues which change from moment to 
moment is critical. An alternative hypothesis 
is that this factor represents procedural inte- 
gration of some kind. 

Factor V is confined to fewer maneuvers, 
but there appears to be a feature common to 
those maneuvers with the highest loadings. 
In these maneuvers responses are made in re- 
lation to anticipations of velocity and rate 
changes. Moreover, these judgments are based 
on visual feedback from the outside environ- 
ment (e.g., the horizon) rather than from the 
feel of the controls or from instrument data. 
If this tentative interpretation is correct then 
this factor corresponds to a factor which has 
been called Rate Control in analyses of labo- 
ratory psychomotor tasks (Fleishman, 1958b; 
Fleishman & Hempel, 1955, 1956). 


Variable 
No. Maneuver Loading 

17 Steep turn (360°) 46 
19 Spin A3 
1 Straight and level 41 
11 Three-point landing 35 
8 Straight and level gear check 34 
14 Power-on stall 33 
9 Traffic pattern at auxilliary field 31 


Factor VI groups most of the maneuvers 
which emphasize “stalls” and slow move- 
ments of the aircraft. Pilots often describe 
the control characteristics of these maneuvers 
as “muddy” or “soft”; that is, there is in- 
creased lag in the response of the aircraft 
to the control movements made—there is a 
“mushiness” in the controls. It thus appears 
that these maneuvers emphasize “kinesthetic 
Variable 

No. Maneuver 


16 Power-off stall 61 
4 Gliding turn 


Loading 


15 Approach to stall 

19 Spin 

14 Power on stall 

20 Rudder control stall 

13 Landing characteristic stall 
Forced landing 
Climbing turn from level 
Level-off from climbing turn 
Slow-flight turn 
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feedback.” There is no direct counterpart to 
this factor encountered in the laboratory re- 
search unless it is the “Postural Discrimina- 
tion” factor previously identified (Fleishman, 
1954). One is tempted to identify the present 
factor with the “flying by the seat-of-one’s 
pants” ability to which pilots often refer. For 
the present, however, we shall employ the 
tentative name of Kinesthetic Discrimination. 


DISCUSSION 


A word about the interpretation of the fac- 
tors is relevant here. The authors are aware 
of the limitations and hazards involved in the 
‘factor interpretation procedure used. Assign- 
ing meaning to factors always involves a cer- 
tain amount of arbitrary decision making 
wherein the decision rules are not easy to 
spell out. In the present instance, interpreta- 
tions were made with the assistance of a psy- 
chologist who was also a skilled pilot and 
thoroughly familiar with the maneuvers in- 
volved. In a sense, one might say that he 
“flew” the factors, or at least he “empa- 
thized” the operations of the pilot and air- 
craft while performing the maneuvers. In go- 


ing about the interpretations it was at first 
thought the factors might be interpretable in 
terms of common subtask operations or re- 
quirements. Alternative possibilities included 


common control movements, or control-dis- 
play relationships. The fact that most ma- 
neuvers were factorially complex did not 
make interpretation any easier. Initially, the 
pilot-psychologist looked for such evidence of 
commonality as Do the maneuvers on this 
factor all involve application of power? or 
Do they all involve lining up ground refer- 
ence points? 

The important point is that there was no 
explicit objective or attempt, initially, to de- 
fine these factors in terms of more basic abil- 
ity constructs. However, after repeated fail- 
ure to “make sense” out of the blind rota- 
tions, descriptions in terms of ability factors 
were attempted. It appeared that this level 
of description best fitted the data. In other 
words, the ability model developed from ex- 
perimental-correlational analyses of labora- 
tory tasks seemed most adequate for describ- 
ing the common requirements of these air- 
craft maneuvers. 
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In support of this factor, interpretation ref- 
erence is made to an earlier report of a cluster 
analysis of measures obtained from an instru- 
ment flight check battery (Butler, Bamford, 
Kautz, & Ornstein, 1954). The measures were 
77 individual items (i.e., parts of maneuvers) 
and they were clustered through 14 successive 
iterations in a modified Tryon analysis. As 
the result, 5 clusters and 23 residuals were 
produced. Attempts to identify the clusters 
“in terms of constructs meaningful to the 
psychologist and/or pilot” at that time were 
unsuccessful. The authors, in fact, stated that 
“this approach was abandoned as fruitless.” 
After the completion of the present study an 
inspection was made of this earlier work. In- 
dications were that each of these five clusters 
was readily identifiable with one of the six 
factors resulting from the present analysis. 
The factor having no counterpart in the previ- 
ous study is the one termed Kinesthetic Dis- 
crimination. This is hardly surprising in view 
of the fact that the cluster analysis was per- 
formed on items from a set of instrument fly- 
ing maneuvers which did not include any of 
the stall series—here found to be the defining 
maneuvers in the Kinesthetic Discrimination 
factor. 

It would have been ideal if the same stu- 
dent pilots in our study had also taken the 
reference battery of ability tests which origi- 
nally identified the factors described. This, of 
course, was not possible. However, Ornstein 
(1954) correlated the “Pilot Aptitude Index” 
composite of the Aircrew Classification Bat- 
tery with performance on the maneuvers in 
the present study. This Aptitude Index is a 
weighted composite of eight aptitude tests, 
and at that time 60% of the weighting com- 
prised psychomotor test scores. The tests in 
use are known to measure the Control Pre- 
cision and Multilimb Coordination factors 
(Fleishman, 1956b). These results showed 
that of the seven Aptitude Index-Maneuver 
correlations greater than .50 (corrected for 
maneuver reliability), four of the maneuvers 
are loaded on our Factor I (Control Pre- 
cision) and four on our Factor III (multi- 
limb Coordination). (Our maneuver appears 
in both factors.) Thus, we find further sup- 
port of the present interpretations. 

More than 13 years ago Neal Miller sum- 
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Edwin 
marized the wartime research on pilot train- 


ing and proficiency measurement (Miller, 
1947). At that time, Miller stated: 


It may be that attempts to make a more pene- 
trating analysis of flying skill will not be profit- 
able until knowledge of simpler psycho-motor 
skills, in situations which are easier to control, 
has been increased and a clearer idea is developed 
of the general structure of human 
motor, and intellectual abilities. 


perceptual, 


This prediction is especially interesting in 
view of our results. Information about the 
general structure of perceptual-motor abilities 
was not available in 1947. Much of the basic 
research has been done within the last eight 
years. While there are obvious limitations in 
our conclusions, we would have been at a loss 
to interpret our factors meaningfully without 
the ability concepts developed from this basic 
laboratory research. Thus, this study provides 
additional evidence of the usefulness of this 
ability framework in describing complex op- 
erational skills. 


SUMMARY 


Measures of flying proficiency in 24 sepa- 
rate maneuvers were obtained on a sample of 
student pilots. The intercorrelations among 
these maneuver performances were subjected 
to factor analytic study. The interrelation- 
ships were best interpreted in terms of ability 
factors, most of which had been identified 
previously in laboratory studies of experi- 
mental perceptual-motor tasks. The factors 
were identified as Control Precision, Spatial 
Orientation, Multilimb Coordination, Re- 
sponse Orientation, Rate Control, and Kines- 
thetic Discrimination. The results seem to in- 
dicate the usefulness of such ability cate- 
gories in describing complex skills. Similar 
analyses of the interrelationships among com- 
ponent performance measures of other com- 
plex jobs may provide one way of defining 
the ability requirements underlying profi- 
ciency in those jobs. 
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The most widely used method for deter- 
mining the qualifications for a job is to esti- 
mate the traits required on the basis of job 
analysis. The job analyst usually is provided 
with a standard list of traits and he indicates 
which are required by the job, and often he 
quantifies his judgments to indicate the im- 
portance or amount of the traits required, as 
in the Viteles Job Psychograph, the USES 
Worker Characteristics Checklist, and the 
Minnesota Occupational Rating Scales. The 
United States Employment Service has re- 
cently published such estimated trait require- 
ments for 4,000 jobs, using a newly devised 
approach which includes the demands of apti- 
tudes, interests, temperaments, physical ca- 
pacities, working conditions, and training 
time (Fine & Heinz, 1957; Studdiford, 1953; 
USES, 1956). 

The problems and procedures of the esti- 
mating approach have been ably discussed by 
Ghiselli and Brown (1955) and Thorndike 
(1949). The ability of job analysts to identify 
specific requirements by different methods has 
been studied by Rupe (1952, 1957). Ghiselli 
and Brown (1955) and Trattner, Fine, and 
Kubis (1955) have reported on the reliability 
and validity of such judgments. McCormick, 
Finn, and Scheips (1957) have factor ana- 
lyzed the requirements of the new United 
States Employment Service system and have 
extracted patterns of such requirements. 

The present study investigates the extent 
to which estimated trait requirements can be 
said to constitute a scalable domain, in the 
sense proposed by Guttman (1950). That is 
to say, do such commonly used requirements 
as Verbal Ability and Motor Speed represent 
unidimensional attributes on which jobs can 


1 The opinions expressed in this paper are those of 
the writers and do not necessarily reflect those of the 
United States Employment Service or the Depart- 
ment of Labor. 


be placed in an unambiguous rank order? 
Unless such traits comprise unidimensional 
scales, it does not make sense to speak of one 
job as requiring more or less of the trait than 
some other job; nor can job analysts be un- 
ambiguously ranked in order of their sensi- 
tivity in judging the requirement. If, on the 
other hand, analysts’ judgments prove scal- 
able, then jobs can legitimately be ranked in 
terms of how much of the trait they require, 
while analysts can be meaningfully ranked in 
order of their perceptual sensitivity to the re- 
quirement. In the event of scalability or quasi- 
scalability, there is the additional question of 
whether analysts’ perceptual sensitivity is a 
general ability or specific to the requirement 
being rated. 


PROCEDURE 


Seven experienced and trained job analysts of the 
United States Employment Service rated 50 jobs on 
33 requirements grouped into three classes: (a) apti- 
tudes, (b) interests, and (c) personality. Analysts 
were provided with definitions of each requirement, 
together with “bench mark” jobs to serve as guid- 
ing examples. 

The 10 aptitude requirements were: Verbal, Nu- 
merical, Spatial, Form Perception, Aiming (Eye- 
Hand Coordination), Motor Speed, Finger Dexterity, 
Manual Dexterity, Eye-Hand-Foot Coordination, 
and Color Discrimination. These are identical in 
name and definition to those used by the United 
States Employment Service in its recent study of 
4,000 jobs, with the exception that in the latter re- 
port Aiming is called Motor Control, and Motor 
Speed is omitted altogether. These aptitudes also cor- 
respond in name to those measured by the General 
Aptitude Test Battery (GATB) (Dvorak, 1947), ex- 
cept that Clerical Speed and Intelligence are omitted 
in the present study. 

The 10 interest requirements were identical in name 
and definition to those used in the recent report by 
the United States Employment Service (Fine, 1957) 
and correspond to the interest factors identified by 
Cottle (1950) in a factor analysis of the Kuder 
Preference Record, the Strong Vocational Interest 
Blank, the Minnesota Multiphasic Personality In- 
ventory, and the Bell Adjustment Inventory. They 
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were: Things and Objects; People and Ideas; Busi- 
ness Contacts; Scientific; Routine Concrete Organ- 
ized Activities; Abstract Creative Activities; Social 
Welfare Activities; Nonsocial Solitary Activities; 
Prestige Satisfaction; and Tangible Productive Satis- 
faction. 

The 13 personality requirements were: Versatility, 
Adaptability to Routine, Submissiveness, Dominance, 
Gregariousness, Self-Control, Self-Sufficiency, Non- 
imaginativeness, Valuativeness, Subjectivity, Objec- 
tivity, Creativity, and Rigorousness. These same re- 
quirements had been previously studied in a paper 
by Fine and Boling (1952). To assist analysts each 
trait definition was equipped not only with “bench 
mark” jobs and examples, but also with a set of 
characteristics of the trait as seen in individuals and 
a set of job elements which would indicate that the 
trait is required. These personality requirements rep- 
resent the prototype from which the Temperament 
Demands of the new United States Employment 
Service system were derived (Fine, 1957) 

The jobs rated were chosen from the Dictionary 
of Occupational Titles so as to be typical of the 
American economy and to represent various levels 
of job difficulty. Analysts made their ratings using 
the method of single stimuli with two categories of 
judgment: “yes” or “no” as to whether the job sig 
nificantly required the trait. In this way each ana- 
lyst indicated on a “go, no-go” basis which of the 
33 requirements applied to each job in turn. The 
estimates were made from standard United States 
Employment Service job descriptions (USES, 1956), 
since it has been found that results compare 
very favorably in reliability and validity to those 
obtained from direct observation (Trattner, Fine, & 
Kubis, 1955). 

The analysts’ judgments were arranged in the typi- 
cal scalogram matrix in which represent the 
analysts and columns represent the jobs. The cells 
of the matrix contained either 1 (indicating that the 
job requires the trait) or zero (indicating that the 
trait is not required). The rows and columns were 
then permuted in order to maximize the concentra- 


such 


rows 


tion of unit entries above the main diagonal of the 
matrix. In this way the matrix was manipulated in 
an attempt to approximate the ideal condition (in- 
dicating perfect reproducibility) where all unit en- 
tries would form a triangle in the upper right-hand 
corner of the matrix. An example of such a final ar- 
rangement is seen in Table 1 which displays the data 
for Eye-Hand-Foot Coordination. This scalogram 


TABLE 2 
GuTTMAN (CR) AND Jackson (PPR) COEFFICIENTS 
OF REPRODUCIBILITY FOR EsTIMATES OF 33 
TRAIT REQUIREMENTS ON 50 Joss 
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Interest Requirements 
Things and Objects 
People and Ideas 
Susiness Contacts 
Scientific 
Routine, Concrete Act 
Abstract Creative 
Social Welfare Act. 
Nonsocial, Solitary Act 
Prestige 
langible, Productive Sat 


Personality Requirements 
Versatility 
Adaptability to Routine 
Submissiveness 
Dominance 8 
Self-Control I7 12 
Self-Sufficiency 41 
Nonimaginativeness 
Valuativeness 5 
Creativity 8 
Rigorousness 39 
Objectivity 29 
Subjectivity 5 


Gregariousness 7 19 


®* The total number of jobs judged 
ny of the seven analysts, 
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was chosen for display simply because it was a 
short table; only seven jobs were rated by any of 
the analysts as requiring the trait. For just over half 
of the requirements the number of jobs rated as 
possessing the requirement was 25 or more. In such 
a final matrix the rank order of the jobs reveals the 
amount of the trait required, while the ranking of 
the analysts is in order of their sensitivity to per- 
ceiving the requirement. 

Two indices of reproductibility were computed for 
the scalogram of each requirement: Guttman’s Index 
of Reproducibility (CR) (1950) and Jackson’s Plus 
Percentage Ratio (PPR) (Jackson, 1954; White & 
Saltz, 1957). The traditional index for testing scal- 
ability has been Guttman’s. But it is generally recog- 
nized (White & Saltz, 1957) that this index suffers 
a serious shortcoming in having no unique minimum 
value. It is drastically affected (in the present case) 
by the number of analysts who attribute the re- 
quirement to the jobs, ie., the distribution of column 
totals in the scalogram. This makes it very difficult 
to say how much reproducibility is evinced by a 
given value of the index. Jackson’s index was de- 
signed to overcome this drawback. Because Gutt- 
man’s index has been widely used, however, it was 
included in the present study. Guttman has sug- 
gested a minimum CR of .90 as one of the criteria 
of scalability. Jackson has tentatively 
minimum of .70 for his index. 

In computing both indices, the errors of repro- 
ducibility were determined from column cutting 
points, using Jackson’s method (1949). Also, the 
reproducibility coefficients were based on the data 
for all jobs and all analysts. None was excluded as 
“error” or as “nonscale types” in an attempt to in- 
crease reproducibility. Hence, these coefficients may 
be considered as conservative estimates of repro- 
ducibility. Table 2 shows the two indices of repro- 
ducibility for each of the 33 requirements. 


proposed a 


RESULTS AND DiscussION 


It should be remembered in interpreting 
Table 2 that in addition to reproducibility, 
Guttman has proposed three other criteria to 
be met in evaluating the hypothesis of scal- 
ability in the case of dichotomous items: (a) 
range of marginal frequencies, (b) random 
scatter of errors, (c) homogeneity of content. 
Inspection of the 33 scalograms indicated 
that in general these three criteria are satis- 
factorily met. Consequently the problem of 
testing scalability is largely a matter of 
whether reproducibility meets the required 
minimum value (.90 for CR and tentatively 
.70 for PPR). 

It is evident that there are marked differ- 
ences in scalability among the three classes 
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of trait requirements. Interests are clearly 
the most scalable. All interest requirements 
meet the criterion of acceptable reproduci- 
bility on both indices, with the possible ex- 
ception of the near-miss of Routine Concrete 
Activities on CR. Personality requirements 
are next in order. Here, over half (7 out of 
13) meet the minimum value on PPR, while 
11 meet the minimum value required for CR. 
Aptitude requirements are the least scalable; 
both indices accept only 3 (approximately 
the same) of the 10 aptitudes. 

Those requirements which fail to meet the 
criterion of reproducibility appear to be quasi- 
scales, although it was somewhat difficult to 
judge with confidence whether the patterns of 
error displayed the gradient property stipu- 
lated by Guttman for quasiscales (Guttman, 
1950). In quasiscales there is not a single 
factor operating, but there is a single domi- 
nant factor and a large number of small ran- 
dom factors. While quasiscales do not permit 
an unambiguous ranking of jobs (and ana- 
lysts), the rank orders are perfectly efficient 
in correlating with some outside variable 
(Guttman, 1950). 


Unity of Analyst Sensitivity 


The extent to which a trait requirement is 
scalable also indicates the extent to which 
analysts’ sensitivity to the trait is a unidimen- 
sional ability. However, scalability does not 
indicate the degree to which such ability is a 
general ability or specific to each trait re- 
quirement. The question, then, is: to what 
extent is the rank order of analysts on sensi- 
tivity the same for all trait requirements? 
To answer this question, the rankings of the 
seven analysts were intercorrelated across 
traits. The average rank order correlation 
(rho) among traits was computed separately 
for each of the three classes of requirements, 
using the method described by Woodworth 
(1954). In other words, traits (not raters) 
were intercorrelated. The scalogram for each 
trait ranks the seven raters with respect to 
“sensitivity to perceive the trait.” In Table 1, 
for instance, Analyst E is most sensitive be- 
cause he identifies five of the 50 jobs as hav- 
ing the trait “Hand—Eye—Foot Coordination.” 
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Analyst A is least sensitive since he perceives 
this trait in only two jobs. These rank orders 
of analysts, if intercorrelated, would show the 
extent to which the analysts had the same 
relative sensitivity on all traits. The pro- 
cedure is analogous to intercorrelating a set 
of tests to determine whether a general or 
specific factor is involved. Note that if a high 
average intercorrelation is obtained it means 
that the ability to rate is the same for all 
traits and that its reliability is high; but it 
does not mean that the traits themselves are 
necessarily the same. If the average intercor- 
relation is low, it means that there is no 
single ability to rate; there are no implica- 
tions for reliability since a low average inter- 
correlation is consistent with either high or 
low reliability. We are correlating traits, not 
raters; hence a low average intercorrelation 
indicates only that the rank order of raters 
(in sensitivity) varies from trait to trait. In- 
asmuch as scalability is direct evidence of 
reliability of the internal type (Guttman, 
1950), a low average intercorrelation would 
indicate the sensitivity is a reliable variable 
within each trait, but that the nature of the 
sensitivity demanded varies according to the 
trait being rated. 

The resulting average rho’s were —.047 for 
Interests, .06 for Personality, and .06 for 
Aptitudes. The average for all 33 require- 
ments was .18. None of these of course is 
significantly greater than zero. It is clear that 
sensitivity to perceiving a trait is highly spe- 
cific to the trait; there appears to be no gen- 
eral across-trait sensitivity. Thus, while sensi- 
tivity is apparently a single or at least domi- 
nant variable, it is a different variable for 
each trait. 


SUMMARY 


To determine the scalability (in the Gutt- 
man sense) of 33 estimated worker require- 
ments, seven analysts rated 50 jobs on a 
“go, no-go” basis as to whether the require- 
ment was involved. The resulting scalograms 
were permuted to maximize the cumulative 
property and the Guttman and Jackson in- 
dices of reproducibility were computed for 
each requirement. 

Almost all of the 10 interest requirements 
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proved to have acceptable scalabilities. Over 
half of the 13 personality requirements were 
scalable, while only 3 of the 10 aptitude re- 
quirements proved scalable. 

In the case of scalable estimates, the scalo- 
gram not only ranks the jobs in order of the 
amount of the trait required, but also ranks 
job analysts in order of their “sensitivity” to 
perceiving the requirement in jobs. The gen- 
erality—specificity of analyst sensitivity was 
investigated by determining the extent to 
which the rank order of analysts was the 
same for all traits. The average correlation 
of the rankings of the seven analysts across 
all 33 requirements was not significantly 


greater than zero, indicating that analyst 
sensitivity is not a general ability but is 
highly specific to the requirement. 
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The Strong Vocational Interest Blank 
(SVIB) was standardized on groups of adults 
employed in a variety of occupations, and the 
scale for any occupation is based on the re- 
sponse differences of persons employed in 
that occupation and of a representative group 
of employed adults called a “men-in-general”’ 
group. 

Studies by Strong (1943, 1955), McArthur 
(1954), and Berdie (1955) have shown the 
extent to which the scores. of college students 
are related to their occupations as adults. 
These studies support the use of the SVIB in 
counseling college students. 

The blank increasingly is used by high 
school counselors, usually as they counsel 
twelfth-grade students. In Minnesota, the an- 
nual reports of the State-Wide Testing Pro- 
grams show that the number of high schools 
making use of the Strong blank from 1952 
through 1959 increased by almost one-third, 
and the number of seniors taking the blank 
increased from less than 10,000 to more than 
16,000. Little evidence, however, has been 
available which indicates that adult men em- 
ployed in different occupations and having 
characteristic interests in college or as adults 
had similarly differentiating interests in 
Grade 12. 

Information derived from the Minnesota 
State-Wide Testing Programs bears directly 
on this question. About one-third of graduat- 
ing seniors from Minnesota high schools take 
the interest blank in Grade 12. In some 
schools, all seniors are tested, but in many 
schools the blank is given only to students 
who show some likelihood of entering occu- 
pations or professions for which there are in- 
terest scales. In some schools, the blank is 
given only to students considering college. 
Thus, although only one-third of all high 


school seniors take the blank, this perhaps 


includes two-thirds of all students for whom 
the blank is most appropriate. 

In 1959, 39 physicians who were graduates 
of the University of Minnesota Medical 
School were identified who had taken the 
SVIB when they were in Grade 12 in 1950, 
1951, or 1952; and 52 graduates of the Uni- 
versity of Minnesota Law School were identi- 
fied who had taken the blank in high school 
between 1949 and 1951. Thirty-two gradu- 
ates with majors in accounting from the 
School of Business Administration also were 
identified who had taken the blank as high 
school seniors from 1949 through 1955. These 
students (all men), when tested in high 
school, were 17 or 18 years of age, were Min- 
nesota residents, and were highly selected on 
the basis of ability and high school achieve- 
ment. All of them successfully completed their 
work in the professional schools of the Uni- 
versity, obtained their professional degrees, 
and presumably entered professions for which 
they had been trained or entered closely re- 
lated jobs. Later follow-up will be needed to 
learn more of the actual occupations of these 
graduates. 


METHOD 


The three occupational groups were compared, first 
on the basis of their scores on 11 selected SVIB 
scales, and then on the basis of interest patterns 
The selection of the scales was based on judgments 
as to the appropriateness of the scale for the occu- 
pations in question and the usefulness of the infor- 
mation for counselors. The method of profile analy- 
sis used was the one described by Darley (1941). A 
primary pattern was identified when more than one 
half of the scales in a group had scores of either A 
or B+, a secondary pattern when more than one- 
half had B+ or B, and a tertiary pattern when more 
than one-half had B or B The classification of 
patterns was done on a strictly counting basis with 
no attempt being made to introduce the judgments 
of raters as described by Darley. 

Tests of significance (x*) were computed by com- 
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PERCENTAGE OF 39 P 


ECTED STRONG VOCATIONAL INTEREST BLANK SCALES 


Letter Grade 


Occupational Group 


Lawyer 


Engineer 


> Administrator 


Ralph Berdie 


Real Estate Salesman 


paring high-scoring and low-scoring students in each 
of the three groups. Scores of A and B+ were con- 
sidered high; scores of B, B—, C+, and C were con- 
sidered low. Thus, in the analysis of the 11 selected 
scales, physicians and lawyers were compared on the 
basis of the number obtaining high scores and the 
number obtaining low scores on each scale. In com- 
paring occupational groups on the basis of patterns, 
students with primary or secondary patterns were 
combined and compared to students with either ter- 
tiary or no patterns. 


RESULTS 


Table 1 presents the percentages of the 39 
physicians, 52 lawyers, and 32 accountants 
tested in Grade 12 who had various letter 
grades on the 11 selected SVIB scales. The 
significances of the differences are presented 
in Table 3. Physicians and lawyers were sig- 
nificantly different on 8 of the 11 scales, ac- 
cepting a .05 level of significance. No signifi- 
cant differences were found on the scales for 
aviator, personnel director, and public ad- 
ministrator for these two groups. Fifty per 
cent of the lawyers obtained scores of A or 
B+ on the lawyer scale, as compared to 18% 
of the physicians; whereas only 2% of the 
lawyers obtained A or B+ on the physician 
scale, as compared to 49% of the physicians. 
The difference between the physician and the 
lawyer group on the osteopath scale was of 
the same extent as the difference on the phy- 
sician scale. The difference between the two 
groups on the real estate salesman scale was 
of the same extent, and even larger than the 
difference on the lawyer scale. 

The physicians and the accountants were 
significantly different, as shown in Table 3, 
on 5 of the 11 scales, three of these being 
closely related to medicine—physician, osteo- 
path, and dentist. The other two were real 
estate salesman and accountant. The real es- 
tate salesman scale differentiated the two 
groups as well as, if not better than, did the 
accountant scale. The lawyer and accountant 
groups were different on 5 of the 11 scales, in- 
cluding the lawyer scale and the accountant 
scale. The two groups were better differenti- 
ated by the lawyer scale than they were by 
the accountant scale, the former providing a 
chi square significant only at the .05 level. 

Table 2 allows comparisons to be made be- 
tween the three occupational groups on the 
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TABLE 3 


2 


AS SHOWN BY x? OF DIFFER 
ENCES BETWEEN NUMBERS OF MDs, LLBs, and 
ACCOUNTANTS RECEIVING HiGu Scores (A 
or B+) on 11 SeLectep SVIB ScaALEs 
AND PRIMARY SECONDARY PAT- 

TERNS ON THE INTEREST GROUPS 


STATISTICAL SIGNIFICANCI 


OR 


Groups Compared 


MD & MD & LLB & 
LLB Acct. Acct. 
Selected Scale 

Lawyer 

Physician 

Accountant 

Farmer 

Aviator 

Engineer 

Osteopath 

Personnel Director 

Public Administrator 

Real Estate Salesman 

Dentist 


Interest Group 
I Biological Science 
II Physical Science 
Ill 
IV Technological 
V_ Social Science 
VI 
Vil 


Production Manager 


Musician 
Certified Public 
Accountant 
Business Detail 
Business Contact 


Vill 
IX 
X Verbal-Language 
XI President, 
Mfg. Concern 


* Significant between .05 and .01. 
** Significant at or beyond .01. 


basis of frequency of twelfth-grade interest 
patterns. The physician and lawyer groups 
were significantly different, shown in 
Table 3, on all groups but the technological, 
social science, musician, certified public ac- 
countant, and president manufacturing 
concern. All three sets of comparisons pro- 
vided significant differences on the two busi- 
ness groups. This suggests that in terms of 
interest in business detail and business con- 
tact occupations, prospective physicians are 
different from prospective lawyers and from 
prospective accountants, and prospective ac- 


as 


of 


Berdie 


countants different from 
lawyers. 

These differences between groups could be 
explained as resulting from counseling which 
directed pupils into occupations compatible 
with their Strong scores. The extent of this 
influence is unknown, but an earlier study 
(Berdie, 1955) on college freshmen suggested 
this may not be important. 


are prospective 


DISCUSSION 


The results leave little question that per- 
sons in these three occupational groups had 
significantly different measured interests when 
they were completing high school. These dif- 
ferences in interests, however, are not always 
of the kind that would have been predicted 
on the basis of previous research. Prospective 
physicians seem to have the interests of adult 
osteopaths to almost the same extent to which 
they have the interests of adult physicians. 
The prospective lawyers included in this sam- 
ple seem to have more the interests of adult 
real estate salesmen than of adult lawyers. 

We might assume that most of the gradu- 
ates of the Medical School will enter medicine 
and will unquestionably be classified as phy- 
sicians during the next 10 years. Even within 
the field of medicine, however, these gradu- 
ates will specialize, and some of the fields of 
specialty will involve interests quite different 
from the interests of most physicians. 

The situation with the lawyers and ac- 
countants is not so clear. The extent to which 
these Law School graduates and accountants 
will be classified easily as lawyers or account- 
ants during the next 10 years is unknown. 
Many of the lawyers may end up doing work 
that is essentially accounting, many may end 
up as salesmen, and many will enter other 
kinds of business jobs. Many of the account- 
ants will continue to work as accountants, but 
others will enter specialties related to ac- 
counting but not characterized by the same 
kinds of interests. Others of the accounting 
graduates will enter occupations and jobs 
only remotely related to accounting. When 
further information is available concerning 
the occupational histories of these physicians, 
lawyers, and accountants, more inferences can 
be made regarding the predictive validity of 
the SVIB. 





Strong Vocational Interest Blank Scores 


SUMMARY 

Three groups of University graduates were 
compared, graduates from medicine, law, and 
accounting, on the basis of SVIB scores ob- 
tained in Grade 12. The scores of the three 
groups were significantly different from one 
another, and pattern analysis of each stu- 
dent’s interest profile revealed that the three 
groups had different profile patternings as 
well as different scores on the individual 
scales. These differences suggest that careful 
use of the SVIB is justified with high school 
seniors. 
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PREDICTION OF AN INTERMEDIATE CRITERION 
OF COMBAT EFFECTIVENESS WITH A 
BIOGRAPHICAL INVENTORY * 


PHILIP HIMELSTEIN ann THOMAS L. BLASKOVICS 


University of Arkansas 


In a recent study, Torrance and Ziller 
(1957) developed a biographical inventory 
to measure risk-taking tendencies. The bio- 
graphical inventory was developed from the 
assumption that, if an individual had partici- 
pated in activities judged to require risk and 
strategy (play poker, play hooky from school, 
engage in contact sports, etc.), he would de- 
velop tendencies that would lead him to be 
more effective in combat. The scale was found 
to be effective in differentiating between aces 
and nonaces in a group of fighter-interceptor 
pilots. A revised Risk Scale was developed 
from a number of items hypothesized to be 
related to the development of attitudes fa- 
vorable to risk taking, and by increasing the 
number of items relating to risk taking in life 
situations in the past. Final selection of items 
for the scale was determined on the basis of 
an item analysis. The split-half reliability of 
the revised scale for 370 combat aircrew per- 
sonnel was found to be .98 (corrected by the 
Spearman-Brown formula). 

The present study is designed to investigate 
the revised Torrance-Ziller Scale as a pre- 
dictor of combat effectiveness among senior 
ROTC cadets. The peer-evaluation technique 
was utilized to develop an intermediate cri- 
terion of combat effectiveness. This is justi- 
fied on the basis of several validity studies 
involving peer nominations, particularly that 
of Haggerty (1953). She found that the Apti- 
tude for Service Rating, essentially a rating 
by fellow cadets at the United States Military 
Academy, was a better predictor of combat 
effectiveness in Korea than any other Acad- 

1 This study was facilitated by a grant to the senior 
author from the University of Arkansas Research 
Fund. The authors would like ‘to express their ap- 
preciation to Ralph T. Simpson and E. H. Murray, 
and their staff, of the University of Arkansas Mili- 
tary Science and Tactics faculty for their splendid 
cooperation. 


emy measures. Eight correlations between the 
rating and two criteria of combat effective- 
ness ranged from .28 to .52, with six reaching 
40 or better. The validity of peer nominations 
within a military setting has been demon- 
strated for other variables, particularly in 
training situations. Webb and Hollander 
(1956) found that peer nominations on in- 
terest and enthusiasm in naval aviation show 
a strong relatjonship to the pass-withdraw 
criterion in subsequent naval pilot training. 
Hollander (1954) reported that peer nomina- 
tions for leadership predicted the pass-fail cri- 
terion in pilot, training at a significant level. 
Evidently the peer-evaluation technique is a 
valid predictor of a wide variety of criteria. 

This investigation is designed to test two 
hypotheses: 


1. The ROTC cadets selected by their 
peers as “best for combat” will obtain 
higher scores on the Risk Scale than those 
selected as “least effective for combat.” 

2. The ROTC cadets who select combat 
branches of the Army as their first choice 
for active duty assignment will obtain 
higher scores on the Risk Scale than those 
who select noncombat branches. 


METHOD 


Subjects. The subjects in this study con- 
sisted of the seniors enrolled in Army ROTC 
courses at the University of Arkansas in the 
fall of 1958. These seniors had been randomly 
assigned to one of four sections. Due to ab- 
sences of either one of the two testing pro- 
cedures to be described below, the N avail- 
able for analysis equaled 57. 

Procedure. The cadets were first tested with 
the revised Risk Scale during the regularly 
scheduled meeting of the section under a “re- 
search set.”” Approximately two weeks later, 
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each cadet, as part of a standard administra- 
tive procedure, made his selections for branch 
of military service for active duty. Two 
months later, all cadets participated in the 
peer-nomination procedure. 

In the peer-nomination procedure, each 
cadet rated only those men in his own sec- 
tion. As in the administration of the Risk 
Scale, a “research set” was employed. The 
cadet nominated, in rank order, five cadets 
as “best for combat,” and five, also in rank 
order, as “poorest for combat.” At the same 
time, nominations were made and 
poorest leaders. The highest nominee in a 
particular set of ratings was given a score of 
+5, the next highest, +4, and so on through 
+1. The nominee selected as the poorest was 
given a —5, and so on through —1. Each 
cadet’s score consisted of the algebraic sum 
of all rankings within his section. A constant 
was added to each score to eliminate the 
minus sign and scores of zero. 


for best 


RESULTS AND DISCUSSION 


Hypothesis 1 related to the relationship be- 
tween peer ratings for combat effectiveness 
and scores on the Risk Scale. The results pre- 
sented in Table 1 confirmed the hypothesis 
that those cadets who are rated as high in 
“combat effectiveness” tend to obtain higher 
scores than those who are rated as low. The 
obtained correlation of .41 between Risk 
Scale scores and peer nominations for com- 
bat effectiveness is significant at the .01 level 
of confidence. 

It is interesting to note that the Risk Scale 
predicted peer ratings on the leadership vari- 
able almost as well as it predicted the combat 
ratings (r= .37, p= .01). There is also a 


TABLE 1 


CORRELATIONS BETWEEN Risk SCALE AND PEER 
RATINGS FOR COMBAT AND LEADERSHIP 


Peer Ratings 
Combat Leadership 


Risk Scale Ai* 37* 
Peer Ratings: Combat 87* 


* Significant at the .01 level. 


strong positive relationship (r = .87) between 
ratings for combat and for leadership. This is 
in agreement with a recent study by Trites, 
Kubala, and Cobb (1959), in which a corre- 
lation coefficient of .69 was obtained between 
those two variables with 729 aviation cadets. 
Apparently these two ratings have much in 
common and it seems likely that both require 
similar interests and experiences. 

The second hypothesis was concerned with 
the relationship between the Risk Scale and 
choices for branch of service: Of the 57 senior 
cadets, 20 selected combat branches of the 
Army (Infantry, Artillery, and Armor). The 
Risk Scale scores of this group were compared 
with the remaining cadets, whose first choice 
was a noncombat branch of service. The mean 
score of the combat group was 26.2 and, for 
the noncombat group, it was 22.2. The ¢ test 
of the difference was 2.14, significant at the 
.O5 level. The second hypothesis, therefore, 
can be accepted. 

The over-all results of this study are en- 
couraging, and suggest the feasibility of pre- 
dicting an intermediate criterion of combat 
effectiveness, and the likelihood of selection 
of combat assignments by ROTC cadets. The 
items in the biographical inventory appear to 
be tapping those experiences and background 
that lead to ratings of greater effectiveness as 
a combat officer and as a leader. 


SUMMARY 


The Torrance-Ziller Risk Scale, a biographi- 
cal inventory, was administered to a sample 
of senior ROTC students. After this pro- 
cedure, the cadets made choices for branch of 
service for active duty and the peer-nomina- 
tion technique was utilized to obtain ratings 


in combat effectiveness and in leadership. 
Scores on the Risk Scale correlated at the .01 
level with the two ratings and a significant 
difference in the expected direction was ob- 
tained between the mean scores of those se- 
lecting combat and noncombat assignments. 
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THE MEASUREMENT OF VOCATIONAL INTERESTS 
BY A STEREOTYPE RANKING METHOD 


KENNETH M. MILLER 


University of Tasmania? 


The need for a shorter method of assessing 
vocational interests than is provided by the 
Kuder Preference Record and the Strong 
scales has long been advocated (Miles, 1947). 
The Rothwell Interest Blank, a_ ranking 
method first developed in Australia in 1947 
(Rothwell, 1947), was an attempt to meet 
this need. The present paper briefly describes 
the revision and standardization of the blank 
carried out by the author between 1954 and 
1958. 


RATIONALE OF THE BLANK 


The blank is based on the assumption that 
people have stereotyped ideas about the na- 
ture of occupations and that expressed pref- 
erences for jobs are largely determined by 
such stereotypes although these may be rela- 
tively independent of knowledge about the 
occupations. Some stereotypes may be based 
on a spectacular but uncharacteristic aspect 
of an occupation, such as the glamour at- 
tached to an air hostess’s job (ignoring such 
aspects as preparing meals and coping with 
airsick travelers). Others are likely to be 
more accurate, as in the case of a bank teller, 
seen as dealing with money and figures. 
Whether the stereotype is accurate or not, a 
person is usually responding to it when ex- 
pressing a like or dislike of an occupation. 

A further assumption is that many of these 
stereotypes belong to one of a limited num- 
ber of relatively homogeneous interest stereo- 
type categories. For instance, while outdoor 
occupations have specific stereotypes, there is 
an overall stereotype for these as requiring 
an interest in outdoor activities with a dis- 
like of being tied down to routine work or 
regularity. Originally Rothwell used nine cate- 
gories which proved too few and, in the course 
of the various stages of the present study, 

1 Now at University College, London 

2A number of senior students and teachers as- 
sisted with the administration of the blank while Z 
Rozen, C. Miller and P. Waters carried out much of 
the routine statistical analysis. 


these have been increased to 12. They in- 
clude 10 categories similar to those in the 
Kuder Preference Record (Outdoor, Me- 
chanical, Computational, Scientific, Personal 
Contact, Aesthetic, Literary, Musical, Social 
Service, Clerical) together with Practical and 
Medical categories. A full discussion of the 
reasons for selecting these categories can be 
found in Miller (1958). As well as adding 
three categories to the blank, the revision in- 
volved an extensive item analysis on the ba- 
sis of which unsatisfactory items were re- 
placed. 

The blank consists of nine panels of items 
(job titles), each panel containing one occu- 
pation belonging to each of the 12 categories. 
The position of these is varied systematically 
in each panel. The first panel in the male 
form of the blank is presented in Fig. 1. The 
respondent is asked to rank these items in 
order of preference, such preference to be 
based on the liking for the type of work. If 
a respondent is consistent, the “stereotypes” 


Farmer 

Civil Engineer 
Cost \ countant 
Scientist 

Sales Manager 
Artist 


Journalist 


Concert Pianist 


Teacher (Primary) 


Bank Manager 
Carpenter 


Doctor 





Fic. 1. First panel in the male form of the Rothwell 
Interest Blank (Miller Revision) 





170 Kenneth 
would be ranked in the same order in each 
block. For various reasons, complete consist- 
ency is not always found; however, the over- 
all sum of ranks gives a clear indication of 
the order of interests. 


UsEs OF THE BLANK 

The blank was designed primarily as an aid 
to interview, although norms are available 
for both secondary school, university, and 
occupational groups. The raw scores provide 
information about the patterns of interests, 
the rank order and relative strength and 
weakness of these, in terms of the 12 stereo- 
type categories. Individual items can be use- 
fully examined in relation to the rest of the 
blank. 

The summated ranks indicate the pre- 
ferred and “rejected” categories, and exami- 
nation of the preferred categories can yield 
suggestions of common elements. When ap- 
parently oppositional interests are looked at 
more closely some common or compatible ele- 
ments may be found. Any such inferences 
drawn from the record are treated as hy- 
potheses for testing in the interview. 

Attention should also be given to incon- 
sistencies in the distribution of first and last 
choices, especially when all but one or two of 
these are allocated to a single category. These 
inconsistencies should also be looked at in 
conjunction with the three free-choice jobs, 
which respondents list as the final stage of 
filling in the blank. In practice it has been 
found that the first three and last three cate- 
gories provide the most useful basis for in- 
terview discussion. 

Counselors have found other uses for the 
blank, such as (a) a basis for group discus- 
sion in which the scored blanks are passed 
back to a school class, and a counselor or 
guidance officer uses them as a basis for a dis- 
cussion of work possibilities in relation to in- 
terests displayed; (5) part of a vocational 
information program in Technical Schools. 
This use is reported fully by Greig (1955). 


VALIDITY 


The internal consistency of the blank was 
checked by intercorrelation of scores of 172 
sixteen-year-old boys, 99 sixteen-year-old girls 
and 210 Primary Teachers college students. 


M. Miller 


In all three samples very few significant posi- 
tive correlations were found. Even where 
these did occur those persons having their 
dominant interest in one or the other of the 
pair were clearly differentiated in terms of 
scores, e.g., for the boys the correlation of 
Mechanical with Practical is .67, yet the 
mean scores for those boys predominantly in- 
terested in Mechanical, 22.9 for Mechanical 
and 42.2 for Practical; while for those inter- 
ested in Practical, the scores are 23.7 for 
Practical and 36.7 for Mechanical (a low 
score indicates a strong interest: scores can 
range from 9 to 108). This type of relation- 
ship held throughout for males and females. 

As yet little evidence of concurrent validity 
is available, but what there is is promising. 
When the mean scores for certain occupa- 
tional groups are examined it is found that 
the most preferred scores come in the ex- 
pected categories. For male science teachers 
(m = 14) the mean score on Scientific is 23.0, 
next lowest score being on Medical, 41.9. In 
a group of engineers (m = 30) categories al- 
most equally preferred were Mechanical and 
Scientific, 39.0 and 42.4, respectively. In a 
group of female clerical workers (m = 60) a 
mean Clerical score of 47.8 was second to 
Aesthetic, 42.2. (This finding of high Aes- 
thetic preference has been obtained in all fe- 
male groups so far tested.) 


TABLE 1 
MEAN CORRELATION OF KUDER AND ROTHWELI 
CATEGORIES FOR ADULT GROUP 
(157 MALEs, 266 FEMALEs) 


Males Females 


Outdoor 
Mechanical 
Computational 
Scientific 
Persuasive 
Aesthetic 
Literary 
Musical 
Social Service 
Clerical 
Practical* 
Medical* 


* Rothwell Practical and Med 
Mechanical and Scientific, re 


ical were correlated with Kuder 
pectively. 
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Construct validity has been investigated by 
correlating relevant categories of the Roth- 
well Blank with corresponding scales of the 
Kuder. In five adult groups (m = 89, 68, 41, 
91, 134) the coefficients range from .32 for 
Social Service to .88 for Musical, all being 
significant. A full report of the studies in 
which the relation between the Kuder and 
Rothwell Interest Blank has been investi- 
gated is to be found elsewhere. The mean 
correlation for each scale based on two male 
and three female samples is given in Table 1. 


Interest Patterns 


Another approach to validity was made by 
investigating the interest patterns of each of 
the dominant interest groups, i.e., the groups 
showing most preference for each ‘category. 
A Kendall test of concordance was applied to 
the ranks of the mean scores for each block 
and in all cases the resulting W coefficient 
was significant beyond the 1% level. 


RELIABILITY 


Reliability of the revised blank has been 
studied both by test-retest and split-half 
methods. For females the test-retest coeffi- 
cients ranged from .44 to .94 for 113 of the 
above teachers college group tested after a 
five-month interval, from .60 to .95 for a 
psychology student group (m= 32) tested 
after a three-month interval, and from .51 to 
.93 for a psychology student group (m = 43) 
tested after a three-week interval. In these 
three samples the mean coefficients for all 
categories were .65, .81, and .82 respectively. 
For a male student group (m = 16) retested 
after a three-week interval correlations ranged 
from .78 to .95 with a mean of .89.° 


8 The item analysis was based on the records of 
938 boys and 725 girls in senior forms of representa- 


As well, split-half correlations were com- 
puted for the initial test of the above groups 
retested after three weeks. The coefficients 


ranged from .71 to .92 for the females and 
from .76 to .95 for the males with mean co- 
efficients of .83 and .86, respectively. 


SUMMARY 


The Rothwell Interest Blank is a method 
of assessing vocational interests by the tech- 
nique of ranking job titles representative of 
occupational stereotype categories. Its main 
use is as the basis for an interview, but the 
results can also be compared with the per- 
centile norms for secondary school, univer- 
sity, and occupational groups. It can be ad- 
ministered to groups or individuals, is rela- 
tively quick and has been shown to have 
promising reliability and validity, though 
more evidence on these aspects is required. 
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tive secondary schools throughout Tasmania (Aus- 
tralia) and of approximately 100 adults. Groups of 
college students and employed persons were the main 
participants in validity and reliability studies. Some 
of these were conducted with the penultimate form 
of the revision which was subsequently changed by 
the substitution of a small percentage of more satis- 
factory items. 
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PERSONALITY CHARACTERISTICS AND 
SAMPLE BIAS* 


LEE G. BURCHINAL 


Iowa State University 


The study of bias from nonreturns in sur- 
vey research has shown that noncooperation 
is seldom randomly associated with variables 
which may be crucial for generalizing the re- 
sults of a study. Results of several studies 
(Edgerton, Britt, & Norman, 1947; Pace, 
1939; Reuss, 1943; Suchman & McCandless, 
1940) support the generalization that return- 
ing a mail questionnaire is related to the Ss’ 
approval of the problem under study and a 
positive relationship with the agency conduct- 
ing the research. 

Very likely, cooperation in field research is 
related to other attitudinal, personality-re- 
lated variables or sociological characteristics 
of respondents. A serendipitious feature of the 
methodology used in a recent study permitted 
testing differences between these types of vari- 
ables for persons who did and did not cooper- 
ate in the study. 


METHODOLOGY 


Data were obtained from 176 students who were 
enrolled in the introductory sociology courses offered 
in the Colleges of Agriculture and Science and Hu- 
manities at Iowa State University. Since the ques- 
tionnaire for the study was too long for completion 
by the average student during a regular class, the 
questionnaire was constructed in two parts. One part 
was administered during a regular class session, and 
the other part was administered during one of sev- 
eral extra evening sessions which the students were 
asked to attend. A coding system was used to assure 
anonymity and to permit combining the two por- 
tions of data for each respondent. 

“Cooperating” students (C) were defined as those 
who voluntarily came to the evening session and 
completed the second part of the questionnaire. 
“Noncooperating” students (NC) were those who 
did not complete the second part of the question- 
naire at the requested time. Data presented in this 
discussion were obtained from the first questionnaire. 
No data are reported from the questionnaires com- 


1 This report is listed as Journal Paper No. J-3622, 
Project 1370, of the Iowa Agricultural and Home 
Economics Experiment Station, Ames, Iowa. 


pleted by a portion of the students during the eve- 
ning meetings. 

The writer administered the questionnaires in all 
classes and during the extra sessions. Since he was 
unknown to the students in these classes, there was 
no rapport established prior to the time of question- 
naire administration. Under these circumstances, stu- 
dents most likely felt a minimum of pressure to co- 
operate in the second portion of the study. These 
conditions approximated an experimental setting for 
studying possible differences between the C and NC 
groups of students for variables included in the in- 
vestigation. 

The data included scores based on a “traditional 
family ideology” scale, and an “authoritarian scale” 
(Huttman & Levinson, 1950), a “powerlessness” and 
an “anomie” scale constructed by Dean (1956), and 
the Gough (1949) Home Index. Tests of split-half 
reliability for the present sample indicated the scales 
were sufficiently reliable for tests of group differ- 
ences (r > .80). 

RESULTS 


The nearly even split between the two 
groups of male and female students, as shown 
in Table 1, provided relatively efficient sta- 
tistical bases for tests of mean differences. 
Three of the eight mean differences among 
the personality-related variables were sta- 
tistically significant. Cooperating females were 
less authoritarian; cooperating men reported 
lesser levels of powerlessness and anomie. Al- 
though the difference between the two groups 
of men was very slight for the TFI means, 
mean differences between C and NC students 
were in the same direction for all other com- 
parisons. Higher scores on all scales indicate 
greater magnitude of the variable measured. 

Sex, farm, or nonfarm family of orientation, 
and family level of living failed to discrimi- 
nate significantly between the C and NC stu- 
dents. Fifty-seven percent of the female com- 
pared to 52% of the male students attended 
the evening sessions, but the corrected chi 
square was nonsignificant, x,” = 1.09, .20 < 
P < .30. More farm than nonfarm students 
were in the C group, 57% and 52% respec- 
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TABLE 1 


MEANS FOR COOPERATING AND NONCOOPERATING STUDENTS 


Women 


Variables Cooperators 


Home index 

Traditional family ideology 
Authoritarianism 
Powerlessness 


Anomie 


Home index 3.08 
11.66 
Authoritarianism 9.97 


Traditional family ideology 


~ >2 
Powerlessness 5.33 


Anomie 18.48 


tively, but x.” = .342, .50 << P< .70. Mean 
scores for the C and NC groups on the level 
of living scale were not significantly different 
for the male or female tests. The direction of 
mean differences were contradictory for the 
two sex groups. 


DISCUSSION 


Sample limitations of the study are obvious 
when one wishes to consider the generality 
value of the present findings. First, a non- 
random sample of predominately sophomore 
college students was used. Since introductory 
sociology is required or among the popular 
electives in most curricula in the Colleges of 
Agriculture and Science and Humanities, there 
is little reason to believe that the sample of 
students comprising the C and NC groups 
would be systematically different from the 
total sophomore class of the University. Spe- 
cific generalization to the total undergraduate 
body or to nonstudent populations cannot be 
made nor is it implied.* 

“For a lively discussion of the values and limita 
tions for generalizing research findings which are 
based upon student samples, see Landis (1957) and 
comments by Kuhn (1957). 


Noncooperators Mean Difference 


n = 47 


SD 


2.66 
9.71 
8.31 
8.53 


In view of the lack of comparable data, the 
present results have suggestive value beyond 
the limits of the sample used in the study. 
The present findings appear to indicate that 
lack of cooperation is associated with a family 
value orientation which emphasizes traditional 
male-female sex roles, power relationships, 
and conventional morality and with person- 
ality characteristics which are related to ex- 
pressions of authoritarianism, powerlessness, 
and anomie. 

These findings provide specific bases for 
requiring a high rate of cooperation from re- 
spondents in survey design researches. Since 
rates of cooperation are generally much higher 
for interview than mailed questionnaire meth- 
ods of data collection, the findings probably 
have greater relevance for the latter. 
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TYPE 


OF MAILING AND EFFECTIVENESS OF 


DIRECT—MAIL ADVERTISING’ 


LEE W. COZAN 


Department of Health, Education, and Welfare 


It is generally contended that first-class 


mail in direct-mail advertising results in a 
higher response from those contacted than 
third-class or junk mail. There appears, how- 
ever, no experimental evidence in the litera- 
ture in support of this hypothesis. Conse- 
quently, a professional technical periodical 
decided to conduct two experiments testing 
the effectiveness of first- and third-class post- 


age in its direct-mail advertising campaign. 


METHOD 


Subjects. The Ss were psychologists 
govern- 
college 


United 


assignment of a 


neers, personnel administrators 
ment, and the military establishment, and 
corporation librarian cated in the 
abroad T h 


particular treatment to an 


and 
States and 
based as follows 
divided 


included in the 


each one of the four lists was into two 


The 


first-class mail experiment and the 


groups. odd-numbered were 
even-numbered 
in the third-class mail experiment 

Vaterials. The stationery was manufactured from 
white sixteen-pound-weight paper. The copy for two 
IBM 


electric typewriter with a type face which resembles 


experiments was prepared on an Executive 
printing types since it has proportional spacing. Ad- 
ditional copies were made from the first copy letter 
The dimen- 
outgoing 


by the offset method, using black ink 
sions of the 
64 X 34 in.; 


envelope, 
11 in 


materials were: 


and promotional letter, 8 
Colored stationery was not used in either experiment 
since the results of a recent study indicated, contrary 
to popular belief, that the response to direct mail 
may not be significantly improved through colored 
stationery (Bender, 1957). 

Data. The data for the two experiments consisted 


of the number of returns and nonreturns from let- 


ters mailed. 


those 
of the author and should not be construed as reflect- 


1The opinions expressed in this article are 


ing the views or endorsement of the Department of 
Health, Education, and Welfare. 


The first using third-class mail 


conducted during September, October, and Novem- 


experiment was 
ber of 1958. The first-class mail experiment was car- 
ried out during January, February, March, April, 
and May of 1959. 

The deadline for each experiment was the end of 
the mailing 
third-class 


30, 1959 for the first-class experiment. The 


the calendar month following period, 


e.g., December 31, 1958 for the 


and June 


project 


letters were mailed daily within each month. 
The 
a fourfold contingency table. Chi 


Techniques of measurement. data was ar- 


ranged in square 


was computed by a simple formula which does not 
require calculation of the four expect 
(McNemar, 1955). The significance of x’ 
mined through the use of the Crawford 


(Crawford, 1959). 


d frequencies 
was deter- 


Evaluator 


RESULTS 


Of 10,000 Ss contacted by first-class mail, 
520 or 5.2% subscribed to the periodical. On 
the other hand, only 230 or 2.3% of the 
10,000 Ss asked to subscribe by means of 


rABLE 1 


COMPARISON BETWEEN Two Types OF MAILINGS 


IN DrrEect-MAIL ADVERTISING 


Number 
Mailed 


Percent 
Variable Returns 
10,000 
10,000 


First-class mail 


Third-class mail 


* Significant at the .001 level of confidence 


third-class mail responded. The difference in 
response between the two mailings was found 
to be significant at the .001 level of confi- 
dence. Table 1 contains the results of the two 
experiments. 
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SUMMARY AND CONCLUSIONS REFERENCES 


In summary, a professional periodical set Benver, D. H. Colored stationery in direct mail 
out to determine what type of mailing would advertising. J. appl. Psychol., 1957, 41, 161- 
contribute most to its subscription promotion 164. 
program. Two experiments were conducted, ©®4Wwrorp, P. L. The Crawford evaluator for sta- 
one involving first-class mail and the other tistics xs “ and c/r. Portsmouth, Ohio: Psycho- 
third-class mail. The results show that first- care vores Siaeet Pram oad 
class mail is considerably more effective than MCNEMAR, Q. Psychological statistics. (2nd ed.) 

: 29 Cc .. oe New York: Wiley, 1955 
‘junk mail.” So much so, that the additional 


postage expenditure is more than justified. (Received August 10, 1959) 
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THE COMPARATIVE VALIDITY OF TWO 
BIOGRAPHICAL INVENTORY KEYS 


SAM C 


. WEBB 


Emory University 


The primary purpose of this study is to re- 
port the increment in validity of prediction of 
first-year college grades when scores from a 
biographical inventory are added to a battery 
of predictors already in routine use in the 
liberal arts college of Emory University. A 
secondary purpose is to compare two methods 
for constructing inventory keys and to ex- 
amine some difficulties encountered in their 
use. 

THE EXPERIMENTAL VARIABLES 

The Existing Battery. The existing battery 
of predictors included the high school aver- 
age, total score on the ACE Psychological Ex- 
amination, 1948 edition, and the score on a 
locally constructed achievement test in mathe- 
matics. A regression equation for this battery 
was developed (Webb & McCall, 1953) on 
the basis of data for 154 students (male and 
female) of the freshman class of 1951 and 
was used routinely from that time through 
1956 for predicting first-year grades for en- 
tering freshmen. The regression equation was 
as follows: Predicted Av. (HS Av.) 
+.428 (Emory math) +.093 (ACE total) 


Emory 


.528 


1 Junius A. Davis, formerly of 
now Dean of the Graduate School, Woman’s College 
of the University of North Carolina, provided the 
items for this inventory. 


University, 


—9.58. The multiple correlation for this equa- 
tion was .753. The standard error of estimate 
was 4.87. 

The Biographical Inventory. The biographi- 
cal inventory was an experimental instrument 
developed in 1955. It contained 200 items of 
the multiple-choice variety.’ The first 141 of 
these items permitted selection of one from 
two to five alternative answers; the others 
permitted multiple responses. In all there were 
914 response categories. The items provided 
information about physical and emotional 
health, socioeconomic and cultural status of 
the family, family relations, and habits asso- 
ciated with intellectual pursuits such as read- 
ing and study. Also included were selected 
items about interests, attitudes, and person- 
ality traits supposedly related to academic 
achievement. 

The Criterion. The criterion was average 
grades for the three quarters of the fresh- 
man year. Letter grades for all courses ex- 
cept physical education were converted to 
numerical values ranging from 40 for A to 0 
for F and averaged to give the yearly aver- 
age (Webb & McCall, 1953). 

The tests and the biographical inventory 
were given during orientation week before the 
beginning of classes. 


rABLE 1 


ITEMS CLASSIFIED BY 


External 
Criterion 


Correlation Weight 


.25 or higher 36 
10 to .24 

— .09 to .09 

-.10 to —.24 
2 


) 
5 or higher 


Total weighted items 


KEYS 


AND ASSIGNED WEIGHTS 


Boys Girls 
External 
Deviate Criterion Deviate 
45 
177 
426 
203 
63 


34 
165 
482 
187 

46 


488 432 
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TABLE 2 
VALIDATION DATA FOR 280 Boys*® ANp 203 Grrts In 1956 FRESHMAN CLASS 
Xi X2 X X4 Xs5 ; M 


000 500 
.086 .622 


—.151 
44 
490 

10.2 


5.9 


X:1 = HS Average. 
: = Emory Math. 
= Total ACE. 


* Boys in upper portion, girls in lower portion of table. 
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PROCEDURE AND SUBJECTS 


Neidt and Malloy (1954) have provided a discus- 
sion and comparison of procedures appropriate for 
assessing the contribution of an additional variable 
to the validity of a battery already in use. Following 
their procedures, two keys for the inventory were 
developed and cross-validated. For each key the re- 
lation between item response and the criterion was 
determined from Flanagan’s table (1936). For the 
external criterion technique key the criterion groups 
were formed by selecting the upper and lower 27% 
of students ranked in terms of first-year average 
grades. For the deviate technique key, the criterion 
groups were formed by selecting the upper and lower 
of students ranked in terms of the magnitude 
of their deviate scores. These scores were obtained 
by computing the difference between the earned and 
predicted average for each student. 

Neidt and Malloy (1951) and others cited by them 
(Meyers & Schultz, 1950; Neidt & Edmison, 1953; 
Neidt & Merril, 1951; Schultz & Green, 1953) de- 
rived their regression equations for computing devi- 


> OF 
ix 


434 
542 


745 
112.3 
28.8 6.7 


I 
=E 


xternal Criterion Score 


Yeviate Technique Score. 
irst Year Average 


ate scores on the validation sample. For this investi- 
gator, however, the notion of an “existing battery” 
implies a battery which has been previously vali- 
dated on another sample and which is being used 
for making predictions for other samples. From this 
viewpoint the problem thus becomes one of deter- 
mining what validity the inventory will contribute 
to such predictions. Following this interpretation, 
deviate scores were obtained by computing differ- 
ences between grades predicted by the regression 
equation derived on the 1951 class and earned grades 
for the validating samples. 

Using separately the data of 326 boys and of 211 
girls in the freshman class of 1956, an external cri- 
terion technique and a deviate technique key for 
scoring the Emory Biographical Inventory were con- 
structed. Inventory scores obtained from both keys 
were determined. To avoid negative scores a constant 
of 100 was added to each score. The intercorrela- 
tions and validities of all predictors based on 280 
boys and 209 girls were computed. They are shown 
in Table 2. (This reduction in sample size as com 
pared with the number used in validating the items 


TABLE 3 


Cross-VALIDATION DATA FOR 303 Boys* AND 172 Grrts in 1955 FRESHMAN CLAss 


Xi Xs X X4 Xs5 X M o 


— .046 430 27.4 6.6 
127 490 , 12.4 6.0 
— .003 .319 109.0 20.6 
.548 ; 118.8 12.2 

117.4 18.3 
19.0 8.0 


.203 
.018 
76.3 


409 
110.9 
18.4 
X: = HS Average 
Xe = Emory Math. 
X; = Total ACE, 


* Boys in upper portion, girls in lower portion of table. 


Deviate Technique Score. 
External Criterion Score. 
First Year Average. 
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TABLE 4 


SUMMARY OF MULTIPLE CORRELATIONS 


1955 
Cross-Validation Sample 


1956 
Validation Sample 


Girls 


N = 172 


Boys Girls 
N = 280 N = 203 
Roi A491 518 A493 A476 
Ro.12 637 611 a 559 
Ro.123 673 .648 . 586 
Ro.1234 824 836 6! 599 
Ro.1235 791 774 65 .600 
Ros 659 .648 r 584 
Ro.64 820 801 65 597 
Ro.6s 800 771 65 598 


| = External Criterion Score 
*redicted Average (1951 Formula). 


irst Year Grades. 


S Average. 5 
mory Math. 6 I 
otal ACE 0 =I 
Deviate Technique Score. 


I 
I 
I 


woud 


is a result of having included variables other than mined for both sexes in both samples by an appro- 


those discussed here in computing the intercorrela- 
tion matrix.) 

The two keys for each sex were then cross-vali- 
dated on the basis of data for 303 boys and 172 girls 
in the freshman class of 1955.* The intercorrelations 
and validities for the predictors are shown in Table 3. 

For both the validation and cross-validation groups, 
appropriate multiple correlations were computed as 
shown in Table 4. The significance of difference be- 
tween R for the existing battery, as computed from 
the sample data, and the R for the battery aug- 
mented by the respective scoring keys, was deter- 


2 Since the 1955 and 1956 classes were expected to 
be essentially similar, the larger class was used for 
item yalidation purposes in the interest of securing 
more stable item weights. 


priate F test (McNemar, 1955, p. 279). 

Similar tests of significance of difference between 
R for the existing battery as computed by the cor- 
relation of grades predicted by the 1951 regression 
equation and earned grades and the R for these pre- 
dictions augmented by the respective scoring keys 
were also computed. A summary of these tests is 
provided in Table 5. 


RESULTS 
Table 1 provides a tabulation of the num- 
ber of item responses assigned the various 
scoring weights for the two keys. This table 


shows that from 40 to 50% of the items are 
weighted on the various keys, that there were 


TABLE 5 


TESTs OF SIGNIFICANCE OF DIFERENCE IN BATTERY VALIDITIES 





Sex Class 
Compared with 
1956 
1955 
1956 
1955 


Boys 


Girls 


Compared with 

1956 
1955 
1956 
1955 


Boys 
Girls 


* Significant at 5% level of confidence, 
** Significant at 1% level of confidence. 


F 


df Ro. 1234 Ro.1235 


1/275 Sa.7* 126.9 ** 
1/298 8.9** 33 = 
1/198 183.5** 88.5 ** 
1/167 4.1* 43° 


Ro 64 
245.7** 

7.3” 
425 0** 

4.0* 


Ro.es 
194.82** 

‘an 
147.70** 

4.30* 


1/275 
1/298 
1/198 
1/167 
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more items with negative weights than with 
positive weights, and that the deviate keys 
have fewer weighted items than do the ex- 
ternal criterion technique keys. 

The dats in Tables 4 and 5 indicate that 
for the 1956 class the difference in validity 
for the existing battery and the battery plus 
either scoring key is significant at a 1% level 
of confidence. This result obtains whether the 
R for the existing battery is a least-squares 
fit to the sample at hand or is obtained by 
use of grades predicted from the 1951 regres- 
sion equation. But since these groups are the 
validation groups, such results are expected. 

In the cross-validation samples there is a 
smaller difference in validity obtained by add- 
ing either scoring key to the existing battery. 
These differences between Rs, however, are 
still significant at the 1% level of confidence 
for the boys, and at a 5% level of significance 
for the girls. These findings hold whether the 
R for the existing battery is a least-squares 
fit to the samples at hand or is obtained by 
use of grades predicted from the 1951 regres- 
sion equation. 

In the validation samples the deviate key 
adds more to R than does the external cri- 
terion key. In the cross-validation studies this 
advantage does not obtain. 

Finally, it should be noted that the differ- 
ences between the validity for the existing 
battery obtained from the least-squares fits 
for the samples at hand and the validity ob- 
tained by using predictions from the 1951 
regression equation are quite small. 


DISCUSSION AND FURTHER ANALYSIS 


In this study the deviate key in the cross- 
validation study is no more effective than the 
external criterion key in adding to the va- 
lidity of the existing battery. Thus, the su- 
periority of the deviate technique over the 
external criterion technique as claimed and 
reported by Neidt and Malloy (1954) is not 
substantiated in this study. 

Further in the cross-validation studies the 
increment in validity obtained by adding 
either inventory key to the battery, even 
though statistically significant, is so small as 
to call into question the practicality of adding 
the inventory to the battery. This finding is 
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also contrary to the finding of Neidt and 
Malloy (1954). 

Since the drop in validity contributed by 
the inventory keys from the validation to the 
cross-validation situation is large in compari- 
son with the shrinkage as estimated by the 
Wherry shrinkage formula (1955, p. 186), 
and since the procedures in developing the 
deviate key differed somewhat from those 
used by previous investigators, the writer 
conducted additional analyses with the ex- 
pectation of securing a clearer understanding 
of causes for the reported results. On the ba- 
sis of such an understanding a more informed 
judgment of whether further work in develop- 
ing the inventory should be undertaken could 
be made. 

Comparability of Samples. The procedures 
of Gulliksen and Wilkes (1950) showed no 
significant differences at a 5% confidence 
level between sexes in the 1956 and 1955 
classes in respect to standard errors of esti- 
mate and slopes of the regression lines. The 
test showed significant differences among the 
three total classes in respect to standard er- 
rors of estimate. A comparison of the classes 
in pairs showed a significant difference be- 
tween only the 1951 and 1955 classes. But 
since, in round numbers, the difference is only 
one, it is doubtful if it has much practical 
significance. Assuming no difference in the 
standard errors of estimate, the hypothesis of 
parallel slopes is not rejected in any com- 
parison. 

Use of the 1951 Regression Equation. In 
respect to the intercepts of the regression 
lines the procedures of Gulliksen and Wilkes 
(1950) further showed significance of differ- 
ences at a 5% or higher level of confidence 
as between boys and girls of the 1956 class 
and of the 1955 class, significant differences 
among the total classes of 1951, 1955, and 
1956, and significant differences in all class 
comparisons by pairs except the 1951-1955 
pair. 

But since students were selected for the 
item validation groups according to their or- 
dering on deviate scores, the choice of the par- 
ticular equation for predicting grades should 
be immaterial. This expectancy was verified 
by considering deviate scores for the 1956 
class computed from regression equations de- 
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rived from the 1951 and 1956 classes. (These 
classes differed significantly at a 1% level of 
confidence in respect to intercept.) The cor- 
relation between the two sets of deviate scores 
was .981 for the boys and .962 for the girls. 
Stated differently, 90% of the students se- 
lected for the upper and lower 27% groups 
on the basis of deviate scores derived from 
the 1951 and 1956 regression equations would 
be identical. 

Stability of Item Weights. Using the devi- 
ate technique and the 1951 regression equa- 
tion, item weights were computed on the data 
of the 1955 class. These weights were com- 
pared with those obtained for the 1956 class 
through the use of contingency tables. The 
contingency coefficient was .17 for the boys 
and .18 for the girls. While these values are 
significantly different from zero at a 5% 
level of confidence (for boys y* = 29.48; for 
29.58; df = 16), as indices of sta- 
bility they are quite low. This is evident from 
the fact that for boys, of 379 items receiving 
positive or negative weights for the 1956 
group, only 29% received a weight of like 
sign for the 1955 group. For girls, of 432 
items receiving positive or negative weights 
for the 1956 group, only 34% received a 
weight of like sign for the 1955 group. The 
number of items receiving the same weight in 
the 1955 and 1956 samples expressed as a per- 
centage of the number receiving that weight 
in 1956 are shown in Table 6. These data in- 
dicate that the higher the absolute magnitude 
of the weight the less stable the weight is. 
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Response Frequency. The relation of ex- 
treme response frequency to magnitude of 
item weight was next considered. By defini- 
tion items which had fewer than 10% or 
more than 90% of the students in either the 
upper or lower 27% groups marking the item 
were considered to display extreme response 
frequency. For both sexes of the 1956 class 
approximately one-half the items with posi- 
tive or negative weights could be so charac- 
terized. 

For the various item weights the percent- 
age of items having extreme response fre- 
quencies in the 1956 sample are shown in 
Table 6. These percentages vary directly with 
the absolute size of the scoring weights and 
inversely with the percentages of items hav- 
ing identical weights in the 1955 and 1956 
samples. 

Assuming that at least some of the iterns 
responses are basically valid, these findings 
suggest that an important cause for the drop 
in validity of the inventory keys from the 
validation to the cross-validation sample is 
the instability of the item weights. A factor 
contributing to this instability is the extreme 
response frequency on the basis of which 
about half the item validities are computed. 
A second possible factor is, as noted by 
Schultz and Green (1958), a difference in so- 
cial atmosphere within which responses were 
given by the 1955 and 1956 The 
writer, however, has no data to substantiate 
the probability of such a difference. 

Suggested procedures which might improve 


classes. 


TABLE 6 
ANALYSIS OF ITEM WEIGHTs IN TERMS OF STABILITY AND RESPONSE FREQUENCY 


No. receiving 
weight in 1956 


Sex sample 


Weights 


20 
159 
Boys 535 
174 
2g 


34 


Girls 


Percentage with 
weight in 1956 and 
1955 samples 


Percentage having 
extreme response 
frequency 
5 70 
23 34 
61 24 
24 45 
7 71 
59 
51 
28 


42 
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stability of item weights include (a) validat- 
ing items on two or more samples and retain- 
ing items which show validity for ali samples, 
(6) eliminating items with extreme response 
frequencies from scoring keys, (c) assigning 
only unit weights, (d) revising items to re- 
duce extreme response frequencies. Subse- 
quent research might show some one or some 
combination of these suggestions effective in 
providing scoring keys with more 
weights. 

Overlap of Item Analysis Groups. The pro- 
cedures for constructing the deviate and ex- 
ternal criterion keys differ only in the method 
of selecting Ss. Thus the difference in what 
they can add to the validity of the existing 
battery is partially a function of the overlap 
of the groups of students used in item valida- 
tion. Theoretically, the percentage overlap in 
the upper 27% and lower 27% groups of the 
two procedures should vary as a function of 
the standard error of estimate for the regres- 
sion equation. The smaller the standard error 
of estimate, the less the overlap and the 
greater the possible difference between the 
two keys in contribution to validity. This pos- 
sibility is complicated, however, by the fact 
that as the standard error of estimate de- 
creases, the greater becomes the overlap be- 
tween the upper 27% group on the external 
criterion procedure and the lower 27% group 
on the deviate procedure—a fact which would 
tend to reduce differences in validity. 

In the present study an analysis of the 
upper and lower 27% groups for the deviate 
procedure for both sexes of the 1956 class 
showed that from 48 to 66% would be in the 
respective upper and lower 27% groups for 
the external criterion groups. This lack of 
overlap is sufficiently large to lead to an ex- 
pectation of difference in validity of the two 
keys. This expectation is confirmed for the 
validation groups. A possible reason why it is 
not confirmed for the cross-validation groups 
can probably be attributed to the unreliable 
item weights previously discussed. 


stable 


SUMMARY 


The purpose of this study was to deter- 
mine how much scores for a biographical in- 
ventory obtained by two keys constructed by 
different methods can add to the validity of 
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a battery of variables already in use for pre- 
dicting freshman grades in a liberal arts col- 
lege. The existing battery consisted of the 
high school average total, ACE test (total 
score), and a math test. The keys—called 
the external criterion and the deviate keys— 
differed in that item validation for the first 
was performed on the upper and lower 27% 
of students ranked in terms of first-year av- 
erage grades; while item validation for the 
latter was performed on the upper and lower 
27% of students ranked in terms of devia- 
tions of their earned averages from their pre- 
dicted averages. 

The results showed that for boys and girls 
in both validation and cross-validation groups 
the difference in the validity for the existing 
battery and this battery plus either scoring 


key was significant at a 5% or higher level of 


confidence. This result obtained whether the 
R for the existing battery was a least-squares 
fit to the sample at hand or was obtained by 
use of grades predicted from a _ regression 
equation based on a third college class. 

The deviate key added more to the multiple 
correlation than did the external criterion key 


for the validation groups. 

In the cross-validation groups, however, the 
differences were smaller than in the valida- 
tion groups—in fact, it is questionable 
whether the difference is of sufficient magni- 
tude to justify the expense of adding the in- 
ventory to the battery. 

Further analyses were undertaken in search 
for possible causes for the drop in validity 
from the validation to the cross-validation 
group. These analyses indicated considerable 
instability of the item weights and the insta- 
bility was found to be related to extreme re- 
sponse frequency. 

Procedures which might help stabilize item 
weights were suggested for further study. 
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Auditory cueing in dynamic localization 
tasks has received some attention in the lit- 
erature (Chapanis, Garner, & Morgan, 1949; 
Fitts, 1951; Garner, 1949; Humphrey, 1952). 
Dynamic localization involves the tracking of 
a target in a spatial area such as a radio 
range or the Flybar Fitts (1951, p. 
1314) discussed the need for systematic in- 
vestigations of combined visual and auditory 
displays; such studies would include, static 
localization tasks where an operator in a man- 
machine system serves as a monitor of an in- 
strument display. 

This study was concerned with an investi- 
gation of the use of auditory cues to reduce 
operator visual search time in the interactions 
of a man-machine system. Specifically, could 
auditory signals of dichotomous dimensions 
assist an operator in a visual search task? 
The experimental questions posed were: 


system. 


1. Can dimensions of auditory signals sig- 

nificantly reduce search time in a dial- 
reading task? 
Can such auditory cues overc me “news- 
paper’’ search patterns such as those re- 
ported by Lincoln and Averbach (1956) 
in their study of spatial factors in check 
dial-reading? 


METHOD 


Subjects. The (Ss) were 50 college stu- 
dents, 33 males and 17 females randomly 
one of five conditions tested. 

Task. A simulated man-machine visual display was 
provided for use of Ss in a visual search task. The 
asked to search a 32-dial display 
and locate the one dial which was not set in a “nor- 
mal” position. Any reduction in visual search time, 
in comparison with a control condition, was to be 
considered as being an operational measure of in- 
creased visual search efficiency resulting from the ef- 
fects of the various cucing dimensions investigated 

Apparatus. A simulated instrument panel was con- 
structed from 4-in. Masonite board and mounted in 


subjec ts 
assigned to 


“operators” were 
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a 2 X 4-in. frame. Thirty-two dial faces, 2 in. in di- 
ameter of the fixed dial type, were prepared. The 
dial gradient was 100 X 10 with .100-in. markings 
drawn with a No. 2 Leroy pen. The dial faces were 
reproduced on Ozalid paper. The basic dial design 
was modified to increase the difficulty of the search 
task and thereby increase the variance along the 
measured dependent variable. This modification was 
accomplished by positioning the zero of each dial at 
one of eight 45° points around the dial face, 
were eight dial faces differing only 
sitions. These dial faces 
balanced order in a 4 X 8 matrix 

For reference purposes, the 32 dials on the panel 
were divided into eight sectors (S), each sector con 
taining four dials, as diagrammed in Fig. 1. The dial 
placement pattern in any one sector on the left side 
of the panel was repeated in the opposing sector on 
the right half of the panel, so that, in regards to 
the location of the zero positions of the dials, the 
two halves of the panel formed mirror images. For 
example, Sector S: mirrored Sector S,, Se mirrored 
S:, etc. These sectors will be referred to throughout 
the remainder of the report 

Dial pointers ( in in.) were prepared from 
commercial bobby pins. The pointers were inserted 
through the panel so that their position on the dial 
face could be manually adjusted against calibrations 
on the back of the panel. The normal position of 
the pointer was at zero. A dial was made “deviant” 
by igo te its pointer (from the back of the 
panel) or more from zero in either direction ac- 
cording to the trial program that was developed 


so there 
in their zero po 
were mounted in counter- 
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Fic. 1. Dial panel layout by sector with cueing 
code. The zero position displacement involved in the 
dial face modification is shown by position of the 
pointers. 
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A 3-in. metal strip containing eight single-pol 
single-throw switches was mounted under each row 
of eight dials. Each switch was positioned directly 
beneath its corresponding dial 

A gravity-drop shutter covered the entire dial 
panel so that the dial pointers could be adjusted by 
E unobserved by Ss. The shutter contained thirty- 
two 24-in. apertures corresponding to the 32 dials 
on the panel. The shutter was opened by a solenoid- 
drop mechanism and was closed manually. 

The shutter-drop mechanism, timer, and the signal 
generating equipment were electronically integrated 
in a control panel so that the auditory signal and 
the timer could be initiated by the same switch that 
opened the shutter. This equipment made it possible 
to generate an audio signal of any frequency and of 
a wide range of durations for either monaural or 
binaural presentation 

The experiment was conducted in an 8 X 10-ft 
room which adjoined a smaller which con 
tained the electronic equipment. The dial panel was 
secured 14{ in a 

393 in 


room 


high 
wide 26 in. deep) placed in the doorway 
between the two rooms. An adjustable stool faced 
the panel. A split headset served to present the audi 
tory signals to the S 

Auditory Cueing Dimension 


from the edge of table (27 in. 


Coded audio signals 
were used to provide search cues to Ss 
ried out 
basic 


as they car 
the experimental visual search task. The 
carrier signal repetitive pure 
tone with a 1:1 on/off ratio. Three dichotomous di- 
mensions wert 
cueing signals 


consisted of a 
selected to characterize the auditory 


Direction was utilized as the first cueing dimen- 
sion by presenting the signal, through a split head- 
set, to either the right or the left ear 
cueing dimension was derived from 
frequency, where a 1000 cps tone was con- 
“high” and a 500 cps tone was considered 
These values represented a frequency differ- 
ence of two intervals under absolute judgment con- 
ditions reported by Pollack (1952). Fletcher’s (1933) 
equal loudness contours were approximated to con- 
trol intensity as a function of frequency. All fre- 
quencies were presented at 30 db (re 0.0002 
cm’). 


The second 
signal 
sidered 
“low.” 


dynes 


The third dichotomous cueing dimension, duration, 
“short” tone of .2 sec 
These duration values were selected 
on the basis of a pilot study which indicated that 
they could be readily distinguished by Ss 

For those conditions in which the audio signal 
carried no cueing information, dimension values for 
frequency and duration were selected which fell mid- 
way between the cueing values listed above. For ex- 
ample, in the control condition the signal presented 
to the S was a 750 cps tone of .35 sec 


consisted of a and a “long” 


tone of .5 sec 


duration. By 
using the above cueing dimensions singly or in com- 
bination, it was possible to vary the signal presenta- 
tion across five conditions of auditory cueing 
Conditions of Cueing. In Condition C:, the con- 
trol condition, no cueing dimensions were included 
on the auditory signal, so that the S had no means 
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of knowing which sector of the dial panel contained 
the deviant dial. 

The direction dimension was used in Condition Ce 
The tone was presented to the 
viant dial was on the 
tors Ss, Su, Sz, Ss) 


right ear if the de- 
right half of the panel (Sec- 
, and to the left ear if the deviant 
dial was on the left half of the panel (Sectors 5S;, So, 
Ss, Se). 

The frequency dimension cued Ss to search either 
the upper or lower half of the panel. In Condition 
Cs, the frequency dimension was combined with th 
direction dimension so that the auditory signal, if 
properly interpreted, cued the S to the panel quad 
rant containing the deviant dial. That is, the two- 
dimensional cue could be used to narrow the search 
task from eight sectors to two horizontally 
sectors (Si, 


aligned 
S:; Ss, Sa; Ss, Se; or Sz, Ss). 
The duration dimension was with the 
4 tone of short 
deviant dial was in 
one of the inner sectors of the panel 


presented 
direction dimension in Condition C, 
duration cued the S that the 
and a long 
tone cued Ss that the deviant dial was in one of the 
outer sectors. This two-dimensioned cue, if properly 
interpreted, restricted Ss’ search task to two verti- 
cally aligned sectors (S:, Ss; Se, Se; Ss, S; "7 

All cueing dimensions were combined in Condition 
C;. The three dimensions, direction, frequency, and 
duration, in combination could indicate which one 
of the eight sectors on the panel contained the de- 
viant dial 

Strips of masking tape were used in the 
conditions to provide appropriate reference 
subjects and to facilitate training 
tory cueing differences, 
tions were similar. 

Procedure. Ss were facing the dial 
and fitted with a headset. Each S was given 
orientation with the panel shutter closed 
trial was presented for 


various 
lines to 
Except for audi- 
as reported above, all condi- 
seated panel 
signal 
\ practice 
each se f the panel, 
making a total of eight practice trials. The timer 
and auditory signal were initiated simultaneously 
with the exposure of the dials. The S then searched 
the panel for the deviant dial. When he had located 
the deviant dial, S opened the corresponding toggk 
switch which stopped the timer and the audio signal, 
thus completing the trial. The shutter was then 
losed and E, concealed by a curtain, returned the 
deviant dial to normal and set up the next dial 
There was a total of 64 trials, two for each dial on 
the panel, presented to all Ss in a standard random- 
ized program. Response times to the nearest .01 sec 
were recorded. The possibility of 
trial with 


wrong 


Ss stopping the 
an inappropriate response (by opening a 
switch) was eliminated by wiring each re- 
switch independently through the control 
panel to the timer. 


sponse 


RESULTS 


A Lindquist Type I analysis of variance 


design (1953} was used to compare the 
measured variable across five conditions of 


the independent variable. The Lindquist de- 
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rABLE 1 


ANALYSIS OF VARIANCE OF RESPONSI 


Source of Variation 


Between People 


Conditions 
People within Conditions 


Within People 
Sectors 
Sectors X Conditions 
Sectors X People within 
Conditions 


Within Cell 1,200 


Total 1,599 


* Indicates an F ratio that is significant at the .01 level. 
sign was modified to increase the sensitivity 
of within-persons tests by considering the 
within-cell source of variation. The within- 
cell sums of squares were pooled with the 
Sectors X People Within-conditions variation. 
The unit of observation was the sum of re- 
sponse times for two trials on each of 32 
dials. A test on C X §S interaction was con- 
ducted to determine if C treatments were ef- 
fecting a break-up of the “newspaper-type” 
search pattern across S sectors. The hypothe- 
sis to be tested in reference to the first ex- 
perimental question, posed above, was that 
the mean response times between C treat- 
ments did not differ significantly. The analy- 
sis of variance, as summarized in Table 1, 
conducted on all conditions revealed that the 
main effects for C treatments yielded an F 
ratio of 16.73, which is significant at the .01 
level of confidence. 

The Duncan Test (1955) was conducted 
on pairs of condition means: a summary is 
presented in Table 2. This test indicated that 
all experimental conditions yielded means sig- 


TABLE 2 
DUNCAN TESTS FOR SIGNIFICANT DIFFERENCES 
BETWEEN Pairs OF CONDITION MEANS 


Condition: C; ‘4 5 C. Ci 
Mean :** 6.21 10.49 18.15 


** Means joined by common underline are homogeneous 
Means not joined by common underline differ at the .05 level 
of significance, 


TIMES BY CONDITION AND SECTOR 


SS US 
35,169.68 
32,953.52 

2,216.16 


8,238.38 
492.48 
9,488.69 
846.23 
1,725.34 


(120.89 


6,917.12 
19,800.95 


64,459.32 


nificantly smaller than the mean of the con- 
trol condition. The mean of Condition Co 
which involved a one-dimensional cue (direc- 
tion) was not only significantly less than the 
control mean, but it was also significantly 
larger than the means of the other experi- 
mental conditions. It can be seen that the 
two bidimensionzl cueing conditions (Cs, C,) 
did not differ significantly from the three-di- 
mensional condition (C,;). In short, any of 
the auditory cueing conditions significantly 
reduced response time; the response times for 
the two-dimensional cueing conditions and 
the three-dimensional condition were essen- 
tially the same, but significantly lower than 
that of the unidimensional condition. 

The second experimental question was rep- 
resented by the hypothesis that there was no 
significant interaction between sectors and 
treatments. The analysis of variance as shown 
in Table 1 indicated that C conditions were 
interacting with S sectors. Interaction be- 
tween S and C was significant at the .01 level. 
The Duncan Test on differences between pairs 
of means indicated that under the control 
condition, C,, there was a differential mean 
response time for various sectors on the panel, 
as shown in Table 3. The mean response time 
for the upper right sector, S4, differed from 
all other sector mean times at the .05 level 
of confidence. Sectors S;, Ss, Sz, and Ss, the 
lower sectors, required the longest response 
times; they differed from the mean response 
times of the upper left sectors, S,, So, and Ss, 
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TABLE 3 


DUNCAN TESTS FOR DIFFERENCES BETWEEN Patrs OF MEANS BY SECTORS FOR 
CONDITIONS SHOWING SIGNIFICANT C EFFECTs ON S 


A. Condition C; 
Sector: Si S 
Mean**: 14.92 


4 


15.08 


B. Condition Cs 
Sector: Si S 
Mean**: 9.18 


9.25 


C. Conditions C3, C4, Cs (composite 


Sector: Si 
Mean: 6.83 6.00 


** Means joined by comr 
signilicance 


on underline are homogeneous. 


at the .05 level of significance. Sector S, re- 
quired a shorter response time than Sectors 
Ss, Se, Sz, and Ss, but a significantly longer 
time than that required by Sectors S,, Soe, 
and §3. 

In Condition Cy, the upper sectors, S;, Se, 
Ss, and S,, required a shorter response time 
than did the lower sectors, S;, Sg, Sz, and Ss. 
This difference is indicated in Table 3 and is 
significant at the .05 level. 

Conditions C3, C4, and C; produced homo- 
geneous means across sectors which indicated 
there was no differential response time across 
sectors for those conditions. 


DISCUSSION 


As Table 2 indicates, the hypothesis that 
auditory cueing can reduce response time to 
a deviant dial in a visual search task seems 
to be borne out. Unassisted visual search, as 
in the control condition, C,;, required an av- 
erage of 18.15 sec. for two observations of 
any one dial. The addition of a lateral cue in 
Condition C. substantially reduced this mean 
time to 10.49 sec. The most effective cue was 
the two-dimensional laterality/frequency cue 
in Condition Cs, which reduced the mean re- 
sponse time to 6.21 sec. Results for Condi- 
tions C, and C, were essentially identical, to 
those for C3. 

A previous study reported by Lincoln and 
Averbach (1956) indicated that Ss tended to 
search quadrants of a visual display in a pat- 


Se 
19.68 


i 


6.45 


6.79 


Means not joined by com: 


tern which was attributed to long-established 
reading habits. Using a criterion measure of 
visual search efficiency (the percentage of de- 
viant dials detected in each panel quadrant), 
they found that Ss located a greater percent- 
age of deviant dials in the following quadrant 
order: upper left, upper right, lower left, 
lower right. These differential percentages 
were interpreted as indicating that Ss tended 
to search the panel in “newspaper” fashion, 
i.e., from left to right and from top to bottom. 

Table 3 of this study indicates that multi- 
dimensional cues seemed to overcome a simi- 
lar search pattern (as expressed in differential 
mean response time across sectors) which was 
observed in the control and unidimensional 
conditions, C; and Cy, respectively. In the 
control condition, C,, Sectors S,, So, and Sz 
(upper left sectors on the panel) showed a 
mean response time significantly shorter than 
the response times for the remaining sectors. 
This seemed to indicate that Ss tended to 
search the upper left sectors of the display 
first in “newspaper” fashion. Sector S, showed 
a longer mean response time than Sectors S;, 
Se, and Ss, but a significantly shorter time 
than any of the lower sectors (S;, Sg, Sz, 
and Ss). 

The results for Co indicated that the one- 
dimensional cue tended to equalize the search 
pattern in that the mean response time for 
Sector S, was brought into homogeneity with 
the mean times for the other three upper sec- 
tors. However, the lower sectors, Ss, Sg, Sz, 
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and Sg, all required a significantly longer 
search time than did the upper sectors (Sy, 
So, Ss, and S4). 

Those conditions having multidimensional 
cues (C3, Cy, and C;) completely eliminated 
differential search time across sectors. All 
sectors showed homogeneous means which 
seemed to indicate that spatial factors had 
been removed from the visual search prob- 
lem through the use of auditory cues. 

In general, it appears that auditory cueing 
can be used effectively in conjunction with a 
visual search task. In particular, the results 
indicate that: 


Auditory cues of various dimensions 
such as direction, frequency, and dura- 
tion can be used by an “operator” in a 
visual search task to appreciably de- 
crease the time required to locate and 
respond to a deviant dial on a simulated 
instrument panel. 

Auditory cues can break up habitual 
search patterns of Ss so that no one 
panel sector or group of sectors is con- 
sistently searched first. 


S. A. Mudd and E. J. McCormick 
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LEARNING FACTORS AS DETERMINERS OF 
PRETEST SENSITIZATION 
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American University 


Several years ago Solomon (1949) sug- 
gested that the frequently used pretest-treat- 
ment-posttest research design was not ade- 
quately controlled in practice by most in- 
vestigators. The possibility of an interaction 
between the pretest and the treatment affect- 
ing the posttest results was logically conceiv- 
able. This possibility would distort an inter- 
pretation of the effect of the treatment if the 
proper control groups were not utilized. Using 
a spelling examination as pretest and training 
in spelling as the treatment, he found that the 
pretest had a depressive effect on the results 
of the training as manifested by the posttest 
results. This was explained in terms of the 
perseveration of the errors made on the pre- 
test by the school children who acted as Ss in 
the experiment. Solomon indicated that a pre- 
test-treatment effect of this sort may very well 
operate in the field of attitude change research 


where the pretest-treatment-posttest research 
design is frequently utilized. 

Recently, Lana (1959a, 1959b) has shown 
that a pretest sensitization does not occur 


where certain attitudinal variables are in- 
volved. In one experiment an attitude of rela- 
tively little importance to the Ss was used as 
the dependent variable (vivisection) and in 
another experiment an attitude of somewhat 
greater concern (ethnic prejudice) was uti- 
lized. In both experiments no interaction ef- 
fect between pretest and treatment or simple 
pretest sensitization (analysis of variance) 
was found. The examination of two probably 
divergent points along a continuum of impor- 
tance of various attitudes held by an indi- 
vidual thus failed to indicate the existence of 
a pretest sensitization of any kind. The pur- 
pose of the present study is to examine the 
contention that pretest sensitization occurs 
when the pretest acts as a learning device, as 
in the case of Sclomon’s results with spelling 
training in school children, and that conse- 
quently, this is the more probable situation 
than the attitudinal case where one will find 


the need for careful consideration of the use 
of various control groups within the pretest- 
treatment-posttest research design. 

In order to examine the effects of learning 
in a comparable situation to those of the ex- 
periments on attitude change (Lana, 1959a, 
1959b) certain conditions had to be fulfilled 
which were different from the conditions used 
by Solomon. Adult Ss were employed in this 
study while Solomon used school children. 
Also, the treatment in the present investiga- 
tian is identical to the treatment used in the 
second study mentioned above (Lana, 1959b), 
allowing for a greater degree of comparability 
among the three studies The principal hy- 
pothesis of this study is that the administra- 
tion of a pretest, which entails some learning 
process, will act to sensitize the individual so 
as to produce a differential posttest response 
from individuals not exposed to the pretest. 
The learning procedure used is the recall of 
meaningful connected material. 


PROCEDURE 


Seventy male students in four introductory 
psychology sections at the American Univer- 
sity served as Ss in the experiment. Males 
were used since they predominated in number 
over the females in the classes, and it has 
been pointed out by King and Cofer (1958) 
that consistent differences in the recall of 
meaningful connected material exist between 
the sexes. These groups were randomly as- 
signed to four treatment conditions summa- 
rized in Table 1. All groups were read a story 
which consisted of a 388 word summary of 
the mental health film on ethnic prejudice, 
“The High Wall.” Two of these groups were 
asked to recall the summary immediately 
after the reading by writing it as near to the 
original as possible on a sheet of paper pro- 
vided by E. This first recall was conceptual- 
ized as the pretest. One of these two groups 
viewed the film 12 days after recalling the 
summary. A 12-day time interval between 
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TABLE 1 
EXPERIMENTAL DESIGN 


Group III Group IV 


Group I Group II 


Reading Reading Reading Reading 


Recall Recall 
12 days 
Film 
Recall 


12 days 12 days 12 days 
Film 
Recall 


Recall Recall 


pretest and posttest was maintained for all 
groups in order te insure comparability in 
this respect with the two relevant studies 
alluded to previously. After treatment, Group 
I was asked to immediately recall the story as 
near to the original as possible by writing it 
on a sheet of paper. Group III was simply 
asked to recall the story 12 days later. Group 
II viewed the film without having been first 
asked to recall the story which was read to 
them 12 days previously. Group IV was asked 
to recall the story 12 days after it was read 
to them without having had an initial recall 
or having seen the film. The second recall for 
all groups is conceptualized as the posttest. 

The two groups not receiving an initial re- 
call condition were judged to be comparable 
with respect to recall ability to the two groups 
receiving the initial recall since previous ob- 
servations have shown introductory level stu- 
dents at American University to be similar in 
their abilities to recall meaningful connected 
material. (See Lana, 1959a for a discussion 
of the general problem of comparability of 
sample groups utilized in the pretest-treat- 


TABLE 2 
MEANS, STANDARD DEVIATIONS, AND Ns FOR POSTTESTS 
or Att Groups ON AccURACY OF RECALL 


M 


Group I 18.3 
(recall and film) 
Group II 
(film) 
Group II 
(recall) 
Group IV 
(neither recall nor film) 
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TABLE 3 
SUMMARY OF ANALYSIS OF VARIANCE ON POSTTEST 
MEANS FOR ACCURACY OF RECALL 


Source SS df MS v p 


01 <1 
70.56 4.80 
2.89 <1 
14.69 


>.05 
<.05 
>.05 


Treatment 01 1 
Pretest 70.56 1 
i tee 2.89 1 
Error 67 


Note.—Error term was computed by the Walker and Lev 
approximation method for unequal Ns. 


ment-posttest research design.) Scoring for 
the accuracy of recall was accomplished by 
dividing the story into 97 “idea units” and 
counting the number of units present in each 
protocol. 


RESULTS 


The difference between the two pretest 
mean scores was examined with a ¢ test for 
independent means and found to be insignifi- 
cant at the .05 level. The variances were 
tested by the F ratio and were homogeneous. 
The Ss receiving the pretest were judged to 
have similar abilities in recalling meaningful 
connected material. A Bartlett’s test was then 
performed on the four posttest means and the 
resulting chi square was not significant. A 
factorial analysis of variance was then ap- 
plied to the posttest means for the four 
groups. A summary of these results appears 
in Table 3. 

The F ratio for the treatment effect was 
not significant at the .05 level. The interac- 
tion effect between pretest and treatment was 
not significant, but the pretest effect was sig- 
nificant. As can be seen from Table 3, almost 
all of the variability was contained in the 
effect of the pretest on the posttest means. 
Thus one of the two possible types of sensi- 
tization has been demonstrated in the pre- 
test-treatment-posttest design under condi- 
tions where the pretest acts as a learning 
device. 

DIscUSSION 


Apparently the pretest has acted differen- 
tially from the treatment in influencing the 
posttest scores. If one examines the various 
conditions to which each of the groups was 
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exposed, a possible interpretation of the re- 
sults becomes clear. It has previously been 
demonstrated (Clark, 1940; King & Cofer, 
1957) that an immediate recall of meaning- 
ful connected material greatly increases the 
efficiency of a second recall some time later. 
Groups I and III were both pretested (recall 
immediately after presentation of the sum- 
mary) so that one would expect their post- 
test scores to reflect this fact in showing a 
greater efficiency of recall during the posttest 
than Group II with one recall only two weeks 
after the reading. Since the treatment effect 
was not significant the implications are that 
the film, which presented all the material 
found in the summary in plot sequence, was 
not as effective in influencing recall of the 
original material, presented before the pre- 
test, as was the act of recall occurring twice. 
A pretest sensitization operating above the 
effect of the treatment condition is assumed 
to have been demonstrated by these results. 
This study, examined with the two previously 
cited studies (Lana, 1959a, 1959b), indicates 
that a pretest, which is in effect a learning 
task, may be the more usual situation in which 
a pretest sensitization occurs in the pretest- 
treatment-posttest experimental design. Solo- 
mon’s study certainly supports the contention 
that the pretest as a learning task will sensi- 
tize the individual to a later posttest exposure. 
No attitude pretest sensitization by either a 
pretest-treatment interaction effect or a sim- 
ple pretest effect has as yet been demon- 
strated. It should be pointed out that Solomon 
obtained a pretest-treatment interaction as a 
sensitizer, while the present study indicates a 
simple pretest sensitization to be operating 
rather than an interaction effect. Both, how- 
ever, are important in the methodology asso- 
ciated with the design in question. 


as Determiners 


CONCLUSIONS 

It is concluded: 

1. In the use of a learning device, such as 
recall of a story, as pretest in the pretest- 
treatment-posttest research design, a pretest 
sensitization is evident. This sensitization dif- 
ferentially affects the reception of a memory 
aid, in the form of a film, in terms of recall 
of the original story in a posttest compared 
with groups having heard the story, but who 
were not pretested in the form of this initial 
recall. 

2. Some degree of learning by the S may 
have to occur during exposure to the pretest 
in order for a pretest sensitization to be evi- 
dent since two previous studies (Lana, 1959a, 
1959b) have failed to show pretest sensitiza- 
tion using attitude change as the pretest-post- 
test measure where little or no learning was 
involved. 
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The concept of accident-proneness has been 
the object of much discussion for approxi- 
mately 30 years. Kerr (1950) reports that 
controversies have arisen concerning the ex- 
tent to which accident-proneness exists or 
has been a determinant of injuries in indus- 
try; Kaywood (1956) states that part of the 
basis for these controversies failure to 
identify the truly accident-prone individual. 
Harris (1950) and LeShan (1952) state that 
as a result of the general confusion concern- 
ing accident-proneness, no predictive test for 
accident-proneness has been developed. 

Viteles (1932) quotes Marbe in defining 
the accident-prone individual as having a 
psychophysiological predisposition toward ac- 
cidents. Therefore, it is necessary, according 
to Teel and DuBois (1954), to differentiate 
between “personal” and “situational” acci- 
dents in an effort to isolate other than “per- 
sonal” factors in the accident data. 

Data concerning individuals who are in- 
volved in accidents, through no apparent 
fault of their own, should be carefully scruti- 
nized before being excluded from the study 
or being assigned “situational” causes. 

Webb (1956) discusses accident-proneness 


is 


as: 

A continuing or consistent tendency of a person to 
have accidents as a result of his stable response tend- 
encies. The problem of identifying accident-prone- 
ness is one of establishing the fact that certain in- 
dividuals had which those ex- 
pected on the chance, being 
equal. 


exceeded 
other things 


accidents 
basis of 


In many cases these “other things” add to 


the contamination of the data and, thus, 
hinder the search for the accident-prone in- 
dividual. Teel and DuBois (1954) suggested 
several refinements in methodology of psycho- 
logical research on accidents, three of which 
were: (a) need for more sensitive criterion 
measure. (5) better differentiation between 
personal and situational accidents, and (c) 
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determination of exposure to hazard by the 
individual. 

It is the purpose of this study to determine 
the value of differentiating job group hazard 
exposure in identification of the accident- 
prone employee. 

DISCUSSION 

The author attempted to refine the data in 
this study by following the preceding sug- 
gestions. Both minor and disabling accidents 
were included in the study, although the ac- 
cident frequency rate in industry is usually 
based on accidents which resulted in loss of 
time by the employee. Only accidents which 
were caused by unsafe acts of the individual 
were included in the study, thereby excluding 
situational accidents. 

The effect of varying degrees of hazard ex- 
posure is demonstrated in the analysis of 220 
personal injury accidents of a group of 737 
male electric utility employees during the pe- 
riod of one year. These people were engaged 
in skilled and semiskilled occupations. 

Table 1 shows the total observed accident 
distribution and the theoretical distribution. 
Determination of theoretical distribution 
based on technique demonstrated by Green- 
wood and Woods; Newbold and Cobb as 
quoted by Viteles (1932); and Webb (1953). 

Comparison of the theoretical and the ob- 
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rABLE 1 
[PHEORETICAL AND OBSERVED DISTRIBUTION OF 220 
ACCIDENTS FOR A Group or 737 EMPLOYEES 


No. of 
Accidents 


Observed 
Distribution 


Theoretical 
Distribution 
545 
164 


25 


549 
157 
30 
1 


2 


 ] 





Hazard Exposure 


served distributions, which include all em- 
ployees in the sample studied, reveals no sig- 
nificant difference, x? = 2.66, df = 3. 

In order to differentiate according to rela- 
tive hazard exposure, the data are also ana- 
lyzed by job groups. It is assumed that in- 
dividuals engaged in the same job task are 
exposed essentially to the same physical con- 
ditions; consequently, variation in individual 
accident experience should reflect individual 
difference in susceptibility to accidents. 

Accident frequency rates for different job 
groups are shown in Table 2. The accident 
frequency rates range from .12 to .65 acci- 
dents per employee, M = .30, SD = .15. 

The accident frequency rate of one group, 
Lineman 3/C, is significantly higher than 
the average, which indicates that this group 
had greater than chance accident experience. 
Analysis of the accident data for this group 
reveals several factors which probably influ- 
ence their unusually high accident rate. Men 
engaged in this occupation are learning a 
new and unusually hazardous job, they are in 
their early twenties, and are relatively inex- 
perienced. According to Schulzinger (1954), 


in his analysis of 35,000 accidents, these fac- 
tors are some of the most important circum- 
stances under which an accident is most likely 
to occur. 

It was determined by the writer that indi- 
viduals in this group are emotionally stable 


rABLE 2 


AccIDENT FREQUENCY RATES FOR JOB GROUPS 


MATCHED IN TERMS OF HAZARD EXPOSURE 


Accident 
Frequency 
Rate 


No. of 
Accidents 


No. of 


Job Groups Employees 


160 
Lineman 2/C 112 
Lineman 3/C 34 
Groundman 189 
Maintenance 1/C 43 
Maintenance 2/C 


Lineman 1/C 


“Mm UI tv 


~- mew 


o 


Maintenance 3/C 
Meterman 1/C 
Meterman 2/C 
Meterman 3/C 


— 


ors UN 


Total 220 


* Significant at .01 level of confidence. 


» Differentiation 


TABLE 3 
VALUES FOR 
THEORETICAL AND OBSERVED ACCIDENT 

DISTRIBUTIONS OF HAZARD MATCHED 
Jos Groups 


Cui SQUARI COMPARISONS OF 


a 


Job Groups Value of 
54 
1.45 
3.91 
5.04 
OO 
1.33 
00 
00 
00 
> 


Lineman 1/C 
Lineman 2/C 
Lineman 3/C 
Groundman 
Maintenance 1/C 
Maintenance 2/C 
Maintenance 3/C 
Meterman 1/C 
Meterman 2/C 


NN NK Nw NH NW WS DW W 


Meterman 3/C 


2.66 


w 


Total Group 


and enjoy better than average physical health; 
therefore, any approach to accident preven- 
tion in this group, such as a safety training 
program, must include all of the employees 
in this particular job. 

Theoretical and observed accident distribu- 
tions of each job group were compared by 
chi square technique. The results are depicted 
in Table 3. No group had a greater than 
chance number of accidents; therefore, no 
accident-prone individuals were identified by 
comparing theoretical with observed accident 
distributions by using the traditional statisti- 
cal techniques. It should be noted that ,? for 
Lineman 3/C was not significant even though 
this group has a significantly higher accident 
rate (Table 2) than the other groups in the 
study. However, close scrutiny of the accident 
records of the individuals who have sustained 
multiple accidents reveals data which suggest 
the presence of accident-proneness. For ex- 
ample, six’ individuals’ accident rates for a 
period of seven years were significantly higher 
than chance; therefore, by definition, they are 
“accident-prone.” 

Analysis of psychological variables associ- 
ated with or contributing to the condition 
known as accident-proneness was impracti- 
cable, in this study, because an insufficient 
number of accident-prone individuals was 
identified. 

It is important to note that some of the 
employees who experienced multiple accidents 
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are accident-prone. More importantly, how- 
ever, comparison of observed accident dis- 
tribution with theoretical distribution for the 
total group fails to yield results from which 
valid inferences concerning existence of acci- 
dent-proneness may be drawn. 

Hazard exposure differentiation, by job 
groups, must be made in an effort to refine 
accident data so that variations in accident 
experience will reflect variation in individual 
behavior, rather than differences in exposure 
to hazardous conditions. 


SUMMARY 


Various problems concerning identification 
of accident-prone individuals have been dis- 
cussed. It was pointed out that care should 
be exercised when interpreting results of ac- 
cident distribution analysis. Significant differ- 
ences between theoretical and observed acci- 
dent distribution of hazard exposure matched 
groups may exist without being determined if 
analysis of just the total group distribution 
is made. Results of this study indicate that 
groups should be matched on the basis of 
hazard exposure before an attempt is made 


to analyze data for presence of accident- 
proneness. 

No accident-prone individuals were identi- 
fied by the traditional statistical analysis of 
the accident experience of 737 employees dur- 


Paul L. Crawford 


ing the one year; consequently, analysis of 
psychological variables associated with acci- 
dent-proneness was impracticable in this 
study. 

It is suggested that the accident-prone in- 
dividual may be identified in terms of chance 
expectations as compared with others who are 
exposed to essentially the same occupational 
hazards, and then an attempt to identify the 
psychophysiological attributes associated with 
accident-proneness should be made. 


REFERENCES 

Harris, F. J. Can personality tests identify accident- 
prone employees? Personnel Psychol., 1950, 3, 
455-459. 

Kaywoop, R. Who are the ‘accident-prone’? Na- 
tional Safety News, February 1956, p. 110 

Kerr, W. A. Accident proneness of factory depart- 
ment. J. appl. Psychol., 1950, 34, 167-170. 

LeSuan, L. L. Dynamics of accident-prone behavior 
Psychiatry, 1952, 15, 73-80. 

ScHULZINGER, M. S. Accident 
Safety News, June 1954, p. 32 

Teet, K. S., & DuBots, P. H. Psychological research 
on accidents. J. appl. Psychol., 1954, 38, 397-399 

Wess, W. B. The illusive phenomena in accident 
proneness. Home Safety Rev., April 1956, p. 24 

Wess, W. B., & Jones, E. R. Some relations between 
two statistical approaches to accident proneness 
Psychol. Bull., 1953, 50, 133-136. 

Vitetes, M. S. Industrial psychology. New York: 
Norton, 1932. P. 341. 


proneness. National 


(Received August 19, 1959) 





Journal of Applied Psychology 
1960, Vol. 44, No. 3, 195-202 


RELATIONSHIPS AMONG CRITERIA OF JOB PERFORMANCE’ 


STANLEY E. SEASHORE, BERNARD P 


INDIK, ann BASIL S. GEORGOPOULOS 


Institute for Social Research, University of Michigan 


This paper reports an empirical exploration 
of the generality of relationships among cri- 
teria of job performance in an industrial 
situation. This is one report of a series of 
studies aimed at discovering how the per- 
formance of individuals and of organizations 
can be conceptualized and measured ade- 
quately.* Here we are concerned with the 
question whether a set of intercorrelations 
among job performance criteria is likely to 
be unique to the population and situation 
studied, or whether, instead, it can be con- 
sidered to be an approximation of some gen- 
erally valid system of criterion relationships. 


THE PROBLEM 


While the use of multiple criteria in studies 
of individual and organizational job perform- 
ance is becoming more common, it is still 
rare. It is even more rare that multiple cri- 
teria are chosen or treated in terms of some 
theory of the composition of job performance, 
or in terms of some rational basis for choosing 
among, combining, or treating separately the 
various measures of performance. The few 
studies of multiple criteria have without ex- 
ception raised new and serious problems, as 
well as doubts about assumptions that are 
commonly made (Gaier, 1952; Rush, 1953). 
The study by Kelly (1957) of the perform- 
ance of medical students is an example. From 
an analysis of 32 criterion variables, he con- 
cluded that the criterion relationships are 
variable in size and sign, and that most of 
the common variance is accounted for by five 
1 This work has received financial support from 
three sources: The Faculty Research Fund of the 
Rackham School of Graduate Studies, University of 
Michigan; the Institute of Labor and Industrial Re- 
lations, University of Michigan and Wayne State 
University; and the firm which collaborated in the 
study. 

2Other reports in preparation deal with the gen- 
erality of the factorial composition of performance 
measures, with the predictability of alternative per- 
formance measures, and with the empirical testing 
of alternative models for predicting the performance 
of individuals and organizations. 


factors, of which four are relatively independ- 
ent of one another. It is clearly not reason- 
able, in the case of these medical students, 
to take one measure as representing total per- 
formance, or to combine the measures in any 
simple way. 

One approach to the solution of multiple 
criterion problems lies in postulating a uni- 
dimensional construct representing “overall 
job performance” or “net performance,” and 
to treat various separate measures as inde- 
pendent estimates of such a single variable. 
This approach leads to attempts to combine 
the elemental criteria through techniques 
which maximize a common factor, maximize 
the predictability of the joint elements, or 
weight the elements in accordance with their 
reliability or predictability (Brogden & Tay- 
lor, 1950; Nagle, 1953; Thompson, 1940). 
The logical problems here arise from the 
common finding that the elements of per- 
formance may be negatively correlated, that 
the elements interact, and that the resulting 
single measure does not reflect well the values 
implied by the initial choice of elemental 
measures. 

Progress in the conceptualization and meas- 
urement of work performance is likely to lie 
in the direction of creating some theory (or 
several theories) of performance comparable 
in nature and complexity to those which have 
been developed in other areas of human be- 
havior, e.g., “intelligence” and “personality.” 
However, to move in such a direction requires 
that we first do some exploration of the 
range and incidence of the relational phe- 
nomena with which we are concerned. In this 
paper we will deal with only three issues: 
(a) whether “job performance” reasonably 
can be treated as an unidimensional con- 
struct, (5) whether the relationships among 
a set of criterion measures are constant within 
error limits, in similar organizations rather 
than being unique in each organization, and 
(c) whether the relationships among a set of 
job performances are the same when treated 
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as referring to a population of individuals as 
they are when treated as referring to a popu- 
lation of sets of individuals comprising sepa- 
rate organizations. 


SOURCE AND NATURE OF DATA 


For the present analysis we have confined our at- 
tention to five job performance variables, four of 
which are “objective” and one judgmental. The vari- 
ables are called (a) overall effectiveness, (b) pro- 
ductivity, (c) chargeable accidents, (d) unexcused 
absences, and (e) errors. It is not proposed that 
these five variables represent all aspects of perform- 
ance. They were selected from a larger roster of 
measures because of their face validity with refer- 
ence to the purpose of the firm, their objectivity, 
their measured or estimated high reliability, and 
their relevance to both individual and organizational 
performance. 

The data concern a delivery service firm having 
operations in several metropolitan areas in different 
parts of the country. Each ..area is organized as a 
“plant” with two or more major divisions, and each 
division has several operating units called “stations.” 
A typical station has a supervisor, an assistant su- 
pervisor, several “night men” or “loaders,” and about 
25 drivers or deliverymen who work days deliver- 
ing packages on their respective routes. The stations 
are geographically separated from one another, and 
somewhat variable in size but otherwise remarkably 
alike. They perform the same kind of activity, em- 
ploy uniformly standard equipment and procedures, 
draw upon the same organizational and financial re- 
sources, employ the same system for establishing 
work standards, observe the same managerial and 
personnel policies, and maintain uniform records 
Twenty-seven such stations, and their 975 nonsuper- 
visory employees, comprise the populations for this 
report. 

The data allow the computation of relationships 
among the five criteria for different populations as 
follows: (a) for all individuals (VN = 975), (6) for 
all stations (N=27), and (c) for individuals in 
each of the separate stations (Ns ranging from 13 
to 54). 


DESCRIPTION OF VARIABLES 


Productivity. This measure was derived from in- 
dividual worker records showing “allowed” and 
“actual” hours for an assigned daily task. Allowed 
times are synthetic standard times computed from 
locally established elemental times, or else direct 
standards derived from time study of the particular 
job in question. The data were reduced to the num- 
ber of man-hours over or under “allowed” for a 
given period of time. Productivity data were avail- 
able for all members of each station except those 
few assigned to jobs for which there are no time 
standards. Data for a one-month period were used. 
The productivity of a station was derived by aver- 
aging the productivity of the members. These data 
have interval scale properties, and for individuals 
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range from 30 to 70. Individual’ performance in terms 
of this productivity measure is remarkably reliable, 
giving a correlation of about .91 for successive two- 
week periods.® 

Effectiveness: Individual. Each station manager 
was asked to rank-order all of his men on the basis 
of his judgment of the overall quality of their job 
performance, taking into account any special circum- 
stances relevant to each individual’s assignment. 
Rankings were sent directly to the research team as 
confidential data. Within each station, the rank- 
orders were normalized and reduced to a‘scale with 
a range of 1 to 7. 

Effectiveness: Independent judgments 
were obtained from a group of operating and staff 
managers concerning the relative effectiveness of the 
various stations. The raters had first-hand knowledge 
of the stations they rated, but were not directly in- 
volved in the operation of the stations. Judgments 
were obtained in the form of ratings on a five-point 
scale of the overall performance of the station as a 
whole during the prior six months. Raters were 
given instructions intended to assure that they would 
take into account any unique circumstances in each 
station, and take into account all aspects of station 
performance. The sent directly to the 
research team as confidential information, in order 
to maximize the independence of judgments. This 
measure was actually obtained before collection of 
other data, and was a basis for selection of stations 
for study. Stations on which there was disagreement 
among raters were eliminated from the study. Con- 
sistency among raters exceeded expectations; reli- 
ability cannot be known but is probably high.4 

Accidents. Since the firm operates trucks on pub- 
lic roads and streets, accident hazards and costs tend 
to be high. Each accident is investigated, and if it 
involves any degree of negligence or improper per- 
formance on the part of the employee, it is consid- 
ered a “chargeable accident.” The number of charge- 
able accidents during the two years preceding the 
study was used as a measure of performance. The 
range for individuals is from 0 to 9+. Station per- 
formance is represented by the mean of individual 
scores, and ranges from O to 3.81 chargeable acci- 
dents. These measures have ratio scale properties, 
and in the case of the individual data, scores are 
bunched at the low end of the scale 

Absences. Since the work of a station is highly co- 
ordinated and tied to a daily schedule, an unantici- 
pated absence may sometimes be costly and dis- 
ruptive. Individual scores on absence represent the 
actual number of instances of “unexcused absence” 
with individual 
Station performance on ab- 


Station.* 


ratings were 


during a two-year period, scores 


ranging from 0 to 9+ 
>For a more detailed description of this variable, 
see Georgopoulos and Tannenbaum (1957). 

* Both individual and station effectiveness ratings 
are, no doubt, somewhat contaminated by the rater’s 
knowledge of past performance records. For exam- 
ple, they have common variance with measured pro- 
ductivity of 10% and 55% respectively. 
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sences is represented by the mean of individual mem- 
ber scores, and ranges from 0 to 1.51. These data 
have ratio scale properties 

Errors. This variable is intended to represent the 
quality of work performed by a driver, and is based 
on his performance in making delivery in difficult 
cases. For each driver, there is a daily count of 
“C.0.D.’s” and “send agains,” these being instances 
of nondelivery. Return of a package to the station 
may, of course, occur for reasons beyond the con- 
trol of the driver, but there are also differences in 
performance arising out of the driver’s familiarity 
with the habits of people on his route and out of 
the driver’s ingenuity, effort, and judgment in mak- 
ing delivery through neighbors and in other ways 
To an extent, these all reflect “quality” of perform- 
anee. For present purposes, individual scores are 
based on a count of nondeliveries over a one-month 
period. The reliability of this variable is r 76 for 
successive two-week periods for individuals. Station 
performance on this variable is based on the mean 
of member scores, and ranges from 1.85 to 4.69 in- 
stances of nondelivery. 


PLAN OF ANALYSIS 


Pearson ¢ correlations were computed for 
all pairs of variables for all types of analyses. 
In some cases, some error is involved in this 
choice of procedure as the absence and acci- 
dent data do not meet the homoscedasticity 
requirement and one variable in each set has 
only ordinal properties. The data were then 
analyzed in a manner that allows assessment 
of the hypotheses stated below. 

_ It is a common practice to assume that any 
one of several relevant criteria of job per- 
formance may be taken to represent the to- 
tality of job performance. This is the case in 
all studies in which a single variable (such 
as productivity, length of tenure, or accident 
rate) is used as a basis for evaluation of pre- 
dictor variables (e.g., selection tests, training 
methods, etc.). The implicit model of “over- 
all job performance” is one of multiple ele- 
ments having additive or multiplicative rela- 
tionships such that the elemental measures 
may be used separately as estimates of “true” 
performance or that the elements may be 
combined to reduce measurement error or 
bias. This conception is tenable only if it is 
generally true that job performance criteria 
are significantly related in consistent ways, 
and if the relations, with due allowance for 
measurement errors and unreliability, are 
high. Hypothesis: Intercorrelations among a 
set of job performance measures for a given 
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homogeneous population of individuals or or- 
ganizations will be consistent (in terms of 
positive or negative association) and rela- 
tively high. It is our expectation that this 
hypothesis will be found invalid, and that 
the lack of validity of the model from which 
it derives will be demonstrated. 

Few investigators have concerned them- 
selves with the question of comparability be- 
tween job performance at the level of the in- 
dividual and job performance at the level of 
the organization. This neglect probably arises 
out of the practical difficulty of getting per- 
formance measures for a homogeneous set of 
organizations, and measures that are compa- 
rable at both individual and organizational 
levels. We propose here, as a basis for analy- 
sis, the simplest conception of the matter, 
namely, that there exists some generally valid 
system of relationships among a given set of 
performance variables such that the same re- 
lationships hold whether the performances are 
those of an individual or those of a set of in- 
dividuals who share the work of an organiza- 
tion. Hypothesis: The pattern of intercorrela- 
tions among a set of job performance vari- 
ables, with allowance for measurement errors, 
will be similar in size and sign as between the 
individual and organizational levels of analy- 
sis. A confirmation of this hypothesis will 
lend support to the view that there may 
exist some widely generalizable system of 
relationships among the several components 
of job performance. A denial of the hypothe- 
sis will suggest that the conceptual mean- 
ing of the job performance variables changes 
when one shifts the referent from individual 
to organization. It would suggest also the 
possibility that there is a still different set of 
relationships among the variables which ap- 
plies to other units of analysis, e.g., each of 
the subtasks comprising an individual’s job 
as compared with the individual’s total job 
or the organization’s job. 

Another common assumption about job per- 
formance measures relates to the presumed 
invariance of relationships across a set of 
similar organizations. The limited available 
published data show wide variations in rela- 
tionships among aspects of job performance 
for unlike occupations and unlike organiza- 
tions. However, it is customary to generalize 
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findings from one situation to a class of ap- 
parently similar situations. It is possible that 
such generalizations are valid. It is also pos- 
sible that each situation is unique and that 
there does not exist a “pattern” of criterion 
relationships which holds for all members of 
a set of similar organizations. Our hypothesis, 
however, is stated positively, as follows: The 
relationships among job performance criteria 
for the individuals in any one organization 
are, within limits of measurement error, rep- 
resentative of the relationships which hold 
across a homogeneous set of such organiza- 
tions. Our expectation is that there will be 
more variance in criterion relationships than 
can be attributed to the random effects of 
measurement error and sampling. 


RESULTS 
Hypothesis I 


Results relevant to Hypothesis I are pre- 
sented in Table 1. Table 1-A shows interrela- 
tionships among the five criteria for the popu- 
lation of 27 organizations (stations). Of the 
10 correlations only five have a sign consist- 
ent with the hypothesis that “good” perform- 
ance on any one variable will be related to 
“good” performance on others. Four of the 
10 correlations are statistically significant at 
the .05 level or lower, and the sizes of cor- 
relations bear no apparent relationship to the 
known or estimated reliability of the compo- 
nent variables. It appears that three of the 
five criteria—effectiveness, productivity, and 
errors—constitute a set of internally consist- 
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ent criteria, while accidents and absences are 
inconsistent with that set and unrelated to 
each other. For example, good performance 
with respect to accidents (i.e., low accident 
rate) is associated with poor performance 
with respect to productivity (i.e., low pro- 
ductivity); low accident rates are associated 
with high error rates. It appears that “job 
performance” as measured by these variables 
at the organizational level, does not consti- 
tute a unidimensional construct and may 
even be differently constituted, in terms of 
direction of relationships, than might be ex- 
pected. That is, the five variables are not all 
significantly related to one another, nor do 
all the variables appear to belong to a com- 
mon cluster. 

Table 1-B replicates Table 1-A but shows 
the interrelationships for the population of 
975 individuals. Of the 10 correlations, nine 
have signs consistent with Hypothesis I, but 
the reversed one is statistically significant. In 
addition, the correlations are all small in 
absolute magnitude, indicating that there is 
little common variance among the five vari- 
ables. Considering both sign and size of cor- 
relations, there is general confirmation of the 
conclusion reached on the basis of Table 1-A, 
namely, that there appears to be an inter- 
nally consistent set of three variables, and 
two other variables independent of that set. 

Table 1-C is an alternative representation 
of the intercorrelations of the five variables 
at the individual level but, in this case, the 
data show the weighted average intraclass 


TABLE 1 
INTERCORRELATIONS AMONG FIVE JOB PERFORMANCE CRITERION VARIABLES IN 27 OPERATING 
STATIONS OF A DELIVERY SERVICE Firm (DECIMALS OMITTED) 


Population A 
(Station) 





Population B 


Population C (Individuals 
(Individuals) 


Within Stations) 


Criteria 3 4 . 2 3 4 § : + 5 





Effectiveness 25 28* —02 

Productivity 30 — 12* —0Ol —26* ‘ —23* 
Chargeable accidents — _ 03 —18* - 09 04 
Unexcused absences - _ 15* = 





—17* —42* 


Errors (nondeliveries) —_ om 


* Significantly different from zero at .05 level or better. Population A: 27 stations. Intercorrelations based on means of member 
scores. Population B: 975 individuals. Nonsupervisory employees in all stations. Population C: Approximately 975 individuals. 
Intercorrelations shown are weighted average within-group correlations for individuals grouped according to station membership. 
Zero correlations arising from instances of within-station invariance are omitted from data. Station Ns range from 13 to 54, with 
a mean of 36. 
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TABLE 2 


SUMMARY OF 270 INTERCORRELATIONS AMONG FIVE JOB PERFORMANCE CRITERION VARIABLES FOR 975 


NONSUPERVISORY EMPLOYEES WITHIN EACH OF 27 OPERATING STATIONS OF A DELIVERY SERVICE 


Effectiveness Versus 


Produc- Acci- Ab- 


tivity dents sences 


Errors 


Obtained Range of — .558 441 444 
Correlatior s* +.830 +.443 +4+.359 


Number of + 26 11 
Correlations - 1 
That are:' 0 


Weighted Average 
Within-Group 
Correlation® 


Probability that the 
Obtained Correlations Very Very 
Are Homogeneous? Low Low OV Low 


* Omitting certain extreme cases tha re unreliable f 


¢ Omitting spurious zero correl 

4 Probability was assessed by t 
and .05-.20 were conside to repr 
tions could have arisen from 
tions shown above. 


a sampling 


correlation computed separately for each of 
the 27 organizations. This treatment removes 
any effects, whether spurious or valid, arising 
from the clustering of our 975 individuals into 
27 organizational sets. The findings in this 
table more nearly conform to Hypothesis I 
than do the results in Tables 1-A and 1-B, as 
there is now only one insignificant reversal of 
sign, and the pattern differentiating the acci- 
dent and absence variables from the remain- 
ing three is less apparent. However, the cor- 
relations remain low, and there is little com- 
mon variance in our set of five performance 
measures. 


Hypothesis Il 


This hypothesis proposed that the intercor- 
relations among the five criteria will be simi- 
lar in size and sign at the individual and 
organizational levels of analysis. Comparing 
Tables 1-A and 1-B, it can be seen that the 
correlations tend to be higher in the case of 
the organizational level of analysis, and sub- 
stantially higher in some cases. For example, 
effectiveness vs. productivity shifts from +.28 
to +.74, and accidents vs. errors shifts from 


r reason 
>» Most of the zero correlations are spur 1ey arise from 
n instances 


* and “low” probability that the obtained distribution of correla- 
a correlation equal to the weighted average within-group correla- 


FIRM 


Absences 
Versus 


Productivity Versus Accidents Versus 


Acci- 
dents 


Ab- 

Errors sences Errors Errors 
—.117 
+.509 


—.257 —.3é 883 -.224 —.502 


+-.503 ate 8 +.962 +.498 


11 4 11 
16 13 ; 6 
0 10 


+.019 


Very Very 


High Low Low Low Low High 


of small N. Actual obtained ranges are greater for some pairs. 
invariance on one of a pair of variables, usually “‘absences.”’ 


of invariance on one of a pair of variables. 
(Snedecor, 1948, Pp. 151-155). Chi squares having p's of 0-.05 


I 


. 
' 


—.18 to —.65. There is one rather large shift 
involving a reversal of sign, from +.15 to 
—.11 in the case of absences vs. errors. These 
results do not permit any conclusive inter- 
pretation as the obtained differences in cor- 
relation size are variable and not subject to 
statistical test of significance. They may be 
statistical artifacts.° The differences in pat- 
tern of correlations between individual and 
group levels are also rather modest and pos- 
sibly insignificant. 


Hypothesis II 


Hypothesis III proposed that there is an 
invariance of criteria interrelationships across 
a set of similar organizations. To test this hy- 
pothesis, the intercorrelations among our five 
variables were computed separately for indi- 
viduals in each of the 27 stations. The results 
are shown in condensed form in Table 2. 


® We have here a specific case of the more general 
problem of interpreting the conceptual and statistical 
differences between the individual datum and the ag- 
gregate or group datum. It is not clear whether the 
differences we show occur because of, or in spite of, 
purely statistical considerations. 
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According to the operational hypothesis, we 
should expect the correlations between any 
pair of variables to be the same for all sta- 
tions, within limits of measurement and sam- 
pling error. We should further expect that the 
average within-groups correlation for each 
pair of variables should, within error limits, 
be representative of the obtained distribution 
of correlations. These expectations are met 
with a high level of confidence only in the 
case of two pairs of variables—productivity 
vs. accidents, and absences vs. errors. With 
reference to the other eight pairs, one cannot 
with confidence say that there exists for this 
set of organizations any correlation which is 
“representative” or “typical” of the relation- 
ships between a given pair of job performance 
variables 

The absolute size of the obtained differ- 
ences in correlation is quite striking for some 
of the pairs of variables. For example, a sta- 
tion chosen at random from this firm could 
produce a correlation between accidents and 
errors anywhere from about —.50 to about 
+.50. For effectiveness vs. accidents, the 
range is from —.41 to +.44. For effectiveness 
vs. productivity, the range is from —.56 to 
+.83. The variation is so great that even the 
pair of variables with highest average correla- 
tion produced two reversals of sign.® 

One must conclude from this evidence that 
the relationships among various aspects of 
job performance are highly variable, even 
within a set of unusually similar organiza- 
tions. For organizations of lesser homoge- 
neity, one should expect even greater varia- 
tion in criterion interrelationships. The hy- 
pothesis that there exists some generalizable 
set of relationships between various aspects 
of job performance is not supported. 


DISCUSSION 


These results indicate that relationships 
among certain different aspects of job per- 
formance are generally small, and that the 
size and direction of relationships are to a 


6 These ranges are, of course, unstable as a single 
instance sets each of the extremes reported here. A 
replication of this study would not produce the same 
ranges, but would show a similar degree of vari- 
ability. 
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large degree unique to each population and 
situation, and somewhat different for organi- 
zations as contrasted to individuals. There is 
little support for the notion that there may 
exist some generalizable pattern, or set of 
patterns, describing the composition of job 
performance and the relationships among the 
components of job performance. The evi- 
dencé, however, does not warrant the serious 
consideration of an extremely opposite view, 
namely, that criterion interrelationships are 
indeterminate or random. The following dis- 
cussion presents some reasons for discounting 
this view and possible ways of finding the 
elusive organizing principles which will re- 
veal the orderliness of otherwise random-ap- 
pearing phenomena. 

Choice of Criterion Variables. The present 
study deals with only five job performance 
variables, chosen because they have face va- 
lidity, objectivity, and either measured or 
estimated high reliability. It is unlikely that 
these measures are all independent of one an- 
other. It is likely that there is some interac- 
tion among them. It is unlikely that they 
adequately represent any true factor struc- 
ture of job performance that may exist. A 
proper test of our hypotheses would require 
the use of variables that are factorially pure 
and cover a wider range of on-the-job be- 
havior. One clue to the relevance of this criti- 
cism of the results is to be seen by post facto 
interpretation of the data itself. Had we used 
only three variables—effectiveness, produc- 
tivity, and errors—the results would have 
conformed much more closely to our hy- 
potheses in most (but not all) tests. 

Choice of Firm. The study was done in a 
firm in which the operating units (stations) 
are unusually similar in structure and func- 
tion. This was considered an advantage, as 
this high degree of homogeneity can maximize 
the possibility of finding uniformities in cri- 
teria interrelationships of a kind that would 
confirm our hypotheses. However, it is also 
true that a high degree of population homo- 
geneity can allow an undetected conditioning 
variable to maximize its effect. This is an im- 
probable but possible defect in the study. 

Range of Variation. The obtained correla- 
tion between a pair of criterion variables de- 
pends in part upon the “range of talent” or 
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absolute range of variation on the two meas- 
ures. It may be that our populations of indi- 
viduals and organizations are so uniform that 
only small correlations could be generated, 
and that the size of these depended more on 
range than on “true” covariation. This would 
account for some of the apparent random- 
ness of obtained correlations. Although plau- 
sible, this argument appears to have little 
merit, as the absolute variations in produc- 
tivity, error rates, and accident rates, for ex- 
ample, seem relatively large rather than rela- 
tively small in comparison with the ranges 
commonly reported from other kinds of firms. 
A specific test of this possibility was made in 
the case of Hypothesis III, as the several or- 
ganizations differed considerably in the ab- 
solute range of performance on some of the 
five variables. However, there proved to be 
no significant connection between absolute 
range and the obtained size or direction of 
criterion interrelationships, except in the case 
of accident rates. Even in this case, removing 
from the population of organizations those 
which were relatively invariant on 
more variables had 
the overall results. 
If we assume that the results are generally 
valid, in spite of these defects, there remains 
the problem of speculating about the modifi- 
cations and elaborations of theory that will 
be necessary in order to measure and assess 
uniformities that we assume to exist in the 
relationships among elements of job perform- 
ance. Three possibilities are here proposed. 
Patterning. It is possible that an individual 
or an organization has a limited number of 
“choices” among alternative patterns of cri- 
terion relationships. One could speculate, for 
example, that if one aspect of job perform- 
ance (e.g., safety, or productivity) is given 
priority in job performance, then a given pat- 
tern of relationships among other perform- 
ance variables may necessarily follow. The 
patterning of performance variables may also 
follow from individual or organizational pref- 
erences among certain nonperformance values, 
personnel policies, and the like. The presence 
of several such “patterns,’ each uniform 
across a subset of a population of organiza- 
tions, could produce a superficial appearance 
of randomness in criterion relationships. An 


one or 


no significant effect on 
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inspection of the data for our 27 organiza- 
tions, however, did not reveal any patterns 
which recurred frequently enough to encour- 
age more precise analysis along this line. 

Conditioning Variables. It is possible that 
there are some conditioning variables, operat- 
ing like catalysts in a chemical system, which 
need to be taken into account in order to 
discern uniformities in the relationships among 
job performance variables. For example, it is 
plausible that productivity and accident rates 
may have a positive relationship under condi- 
tions of low hazard and a negative relation- 
ship under conditions of high hazard, and 
that both conditions could occur in the same 
organization. 

Independent Causal Variables. Each of a 
set of job performance variables may well be 
dependent upon different causal variables, or 
differently weighted common determinants. 
There is considerable evidence already that 
different aspects of job performance may be 
to some extent independently determined 
(Brayfield & Crockett, 1955; Ghiselli, 1956; 
Vroom, 1959). Under these conditions it can- 
not be expected that criterion measures will 
be highly correlated, and it is possible also 
for fluctuations in intercorrelations to arise 
from differential weighting of casual elements 
in each situation. If, for example, Criterion 
Variable V, = f(ax; + bxe + cx3...), then in 
different organizations the values of the co- 
efficients, a, b, c, etc., could be different in 
such a way that Variable V, would have a 
different composition from situation to situa- 
tion. Such a condition would lead to criterion 
intercorrelations of great variability as each 
criterion may be a product of a different 
causal system. 

In any case, it seems clear that it is risky 
to make any simple assumptions about the 
elements which comprise “job performance,” 
and about the probable relationships among 
them in any given situation. The weight of 
the evidence favors more emphasis on the use 
of multiple criteria and, at least for the pres- 
ent, the separate treatment of each criterion. 
The practice of combining criteria is likely, 
in many cases, to simply randomize out real 
differences in quality of performance or, at 
best, to maximize a unitary construct of 
“job performance” which is unique to the 
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population which originally provided the raw 
data. 


SUMMARY AND CONCLUSIONS 


Intercorrelations among five job perform- 
ance variables for 27 organizations and for 
their respective and combined members show 
that the relationships are generally small, and 
that the size and direction of relationships 
are generally more variable than can be ac- 
counted for on the basis of measurement and 
sampling errors. 

These data are interpreted as contradicting 
the validity of “overall job performance” as 
a unidimensional construct, and as a basis 
for combining job performance variables into 
a single measure having general validity. The 
data also indicate that the use of a single job 
performance variable as a “sample” of a set 
of job performances is not justified without 
prior determination of interrelations among 
the different aspects of performance. 

It is proposed that the measurement and 
use of job performance criterion variables will 
remain at a primitive and empirical level 
until there is created some complex theory of 
job performance which takes into account 
systems of causal and conditioning variables. 


Stanley E. Seashore, Bernard P. Indik, and Basil S. Georgopoulos 
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INTERRATER AGREEMENT AND PREDICTIVE VALIDITY 
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INTRODUCTION 


Buckner (1959) recently presented data in- 
dicating that low interrater agreement may 
result in ratings which have higher validity 
against a complex criterion than ratings made 
by raters who agree well with one another. 
Buckner’s argument is that in a complex rat- 
ing situation where different raters observed 
different aspects of the criterion behavior, the 
different raters are reporting what appears to 
be factor pure components of the criterion. 
Naturally, these ratings of separate compo- 
nents will have low intercorrelations. In Buck- 
ner’s situation, only one person could directly 
observe the S performing in his duty station 
aboard a submarine. It seems important, then, 
to examine the role of interrater agreement 
in other situations; for example, ones where 
more of the raters have opportunity to ob- 
serve general rather than specific aspects of 
behavior, in order to find if the conclusions 
concerning the factor model hold in other 
types of situations. 


METHOD 


In the course of developing an intelligence test for 
illiterate Iranians, independent ratings of the intelli 
gence of 549 illiterate Iranian truck drivers were ob- 
tained from their supervisors. These ratings were 
made by having pairs of supervisors individually 
rank order from 16 to 42 employees in intelligence. 
These ranks were converted to standard scores. Each 
employee was also given an individually adminis- 
tered intelligence test. This test was in Farsi, the na- 
tional language of Iran. For all 549 employees on 
whom the two ratings were made, a correlation be- 
tween ratings and intelligence test scores of +.31 
was obtained. This indicates a definite statistically 


‘Formerly with the Iranian Oil Exploration and 
Producing Company, Iran. The views expressed in 
this article are those of the author and do not repre- 
sent the policies of the Iranian Oil Operating Com- 
panies 

The authors wish to express appreciation to M. 
Javid, Siriki-Madat, A. Boldaji, A. Hozhabry, A. 
Sadri, Towhidi, and Tahmasebi of the Iranian Oil 
Companies for their roles in devising the test in- 
strument and collecting and analyzing the data used 
in this report. 


significant relationship between the test scores and 
the ratings. To determine the extent to which unreli- 
ability of raters or ratings affects the validity, the 
data were divided into four groups. 

If two raters agreed on a rating of an individual 
so that less than .7¢ separated the two ratings, it 
was decided that these raters agreed on the rating 
of this individual. If the raters’ judgments were more 
than .7o0 apart it was decided that the raters dis- 
agreed. The value .7¢ for dichotomization was chosen 
arbitrarily to ensure that only relatively large differ- 
ences would be classified as disagreements. The pairs 
of ratings of 362 Ss were in agreement and 127 pairs 
were in disagreement. Secondly, if two raters agreed 
on more than 40% of their ratings (in that more 
than 40% of their ratings differed by less than .7c) 
then these were regarded as reliable raters. The raters 
who agreed less than 40% of the time (using the 
criterion of .7¢) were regarded as unreliable raters. 
The value of 40% was chosen after inspection of the 
data to provide approximately equal numbers of reli- 
able and unreliable rater pairs. Thus, four groups of 
ratings were obtained: We have in Group 1 the rat- 
ings on which raters who tended to agree, agreed 
most highly. In Group 2 we have the ratings where 
raters who in general disagreed, but who were able 
to agree on this particular group of employees. In 
Group 3 we have the ratings wherein the raters that 
were in general agreement deviated from each other 
in specific instances, and in Group 4 we have the 
raters that are in general disagreement and spe- 
cifically they disagree more than .7¢ when they made 
these ratings. The formation of all of these groups 
was made upon inspection of the data but before 
the correlation coefficients were computed. 


RESULTS 


In Table 1 we see the correlation between 
intelligence test scores and the average cri- 
terion ratings given by different groups of 
raters. The homogeneity of correlation test 
given by Rao (1952) indicates that all of 
these ratings constitute a homogeneous set 
(y* = 5.1; df = 3), that is, any one is not sta- 
tistically different from the remainder of the 
group. Note in general, however, that the di- 
rection of the differences among correlation 
coefficients is the opposite of the direction 
found by Buckner, and, in general, they are 
in the direction that one would predict on the 
basis of common sense. The validity for the 
reliable raters, when they are rating reliably, 
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TABLE 1 
CORRELATIONS BETWEEN INTELLIGENCE TEST SCORES 
AND CRITERION RATINGS DIFFERING IN 
RELIABILITY 


Criterion Ratings 


Reliable ratings from reliable raters 


Reliable ratings from unreliable raters 
Unreliable ratings from reliable raters 


Unreliable ratings from unreliable raters 


otal 


is highest. This value is significantly differ- 
ent from zero. The validity of the ratings 
given by unreliable raters when they are rat- 
ing unreliably is not statistically significant 
from zero. Thus, it appears that in situations 
where raters have an opportunity to observe 
behavior rather freely, and where the aptitude 
being measured is easily understood, as well 
as factorially complex, the factor analysis 
model proposed by Buckner does not seem 
appropriate. 


The factor analysis model proposed by 
Dingman and Guilford (1954), and revised 
by Jones (1957), has as its principle the sta- 
tistical adjustment of ratings so that only one 
factor in the judgments of raters is being 
measured in the composite rating. Thus, the 
ratings are weighed by their agreement rather 
than by their disagreement which is Buck- 
ner’s suggestion. On the basis of the data 
presented in this paper, which is inconsistent 
with the data presented in Buckner’s paper, 
the suggestions proposed by Dingman and 
Guilford, and by Jones, seem more generally 
applicable. 
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Recent studies of vigilance behavior have 
concentrated on the role of signal character- 
istics in the maintenance of attention. One of 
the more important signal characteristics has 
been found to be signal density, also called 
signal rate or frequency. Several studies 
(Deese & Ormond, 1953; Mackworth, 1950; 
Weldon, Yafuso, & Peterson, 1956) have indi- 
cated that the probability of signal detection 
is some increasing function of the density of 
the signal; however, the precise functional re- 
lationship between these two variables is not 
fully known. 

Deese and Ormond (1953) found that the 
percentage of signals detected in a simulated 
radar-monitoring task increased with the fre- 
quency of signals presented within a fixed 
time interval. Mackworth (1950) found a 
similar effect when Ss monitored movements 
of a pointer in a clockwatching task. In a 
dial setting task, Weldon et al. (1956) found 
that the percentage of errors detected in- 
creased with increasing numbers of errors in 
the task. There was a tendency, however, for 
the percentage of errors detected to level off 
and remain fairly constant beyond a certain 
point. No evidence is known to the authors 
which indicates if this relationship of in- 
creased error detection efficiency persists at 
relatively high levels of error, or if this rela- 
tionship exists in other types of monitoring 
tasks, such as proofreading. 

Two related problems involve the effect of 
signal or error density on the frequency of 
false reports, and the relation between false 
reports and frequency of correct detections. 
Deese (1955) found no evidence for a rela- 
tion between the frequency of false reports 
and frequency of correct detections, and he 
reports no evidence on the effect of signal 
density on false reports. 

A final issue concerns the effect of shifts in 
signal density on the probability of signal de- 


tection. It might be theorized that Ss estab- 
lish some kind of set or expectancy while en- 
gaged in a vigilance task at a fixed error- 
density level. When, unknown to S, a change 
is made in the error density, an expectancy, 
if already acquired, may act to facilitate or 
reduce the probability of error detection. 
Some basis for this reasoning comes from a 
study by Mowrer (1950) who discovered 
evidence for preparatory-set phenomenon in 
reaction time. RT latency was shown to in- 
crease as a function of the temporal change 
between a warning and stimulus signal, as 
compared to this interval which was fixed 
during the training period. 

Specifically, this study was designed to 
yield data on the following: 

1. What is the effect of error density on 
the probability of error detection in a proof- 
reading task? 

2. What is the effect of error density on 
the number of false reports made? 

3. Is there a relationship between the num- 
ber of false reports and the number of cor- 
rect responses? 

4. What types of error are detected most 
frequently? 

5. What effect do shifts in error density 
have on error detection, i.e., do Ss acquire 
expectancies which facilitate or inhibit error 
detection when changes in signal density 
occur? 


METHOD 


Subjects. Two hundred and sixteen students, se- 
lected from classes at the University of New Mexico, 
served as Ss. 

Materials. An excerpt from Canada, by Andre 
Siegfried (1937), was employed as the proofreading 
material. A passage was selected containing material 
which was relatively homogeneous in content, and 
which was interesting without being emotion-pro- 
voking. The experimental manuscript consisted of 15 
pages of typed, double-spaced material 

Types of Errors. Three types of typographical er- 
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rors were built into the manuscript. These were omis- 
sion, transposition, and substitution which were se- 
lected because of their prominence in proofreading, 
as reported by Scheidt (1940). A description and ex- 
ample of these errors are shown below: 

Example 


Error Description 


Omission One letter deleted your-yor 


Transposition Exchange in position of | time-tmie 
two adjacent letters 

Substitution One letter wrong work—wark 
These errors were placed in proportionate numbers 
in the manuscript, and the position of the error, the 
specific word to be altered, and the specific letter in 
each word to be altered were determined in a ran- 
dom fashion 

Design. Six groups of Ss were required to proof- 
read the manuscript within a 50-minute period. Ss 
were tested in groups ranging from 15 to 50. Dif- 
ferential treatment of the groups consisted of six 
levels of error density, defined as the frequency of 
built-in typographical errors per five pages. Since 
each page contained about 265 words, there were ap- 
proximately 1555 words per five Thus an 
error density of 15 indicates 15 errors per 1555 
words, and an error density of 120 indicates 12( 
errors per 1555 words. Groups I through VI con- 
sisted of 36 Ss each, assigned at random, who faced 
a 6-, 9-, 15-, 30-, 60-, or 120-error density level for 
the first 10 pages of the manuscript. The last five 
pages, unknown to the Ss, were divided into three 
different error levels per group. Thus, one third of 
the Ss in each of the six initial groups had their 
error levels changed to 6, 30, or 120 errors for the 
remaining five pages. The experiment was cast in the 
form of a 6 X 3 factorial design with Ss randomized, 
permitting a test of: (a) the effect of error density 
on detection of errors; (6b) the effect of a set, if es- 
tablished in the proofreading of the first 10 pages, 
on the detection of errors in the last five pages; and 
(c) the possible interaction of the two treatment 
variables 

Instructions and Procedure. All Ss were given 
identical instructions in the proofreading task, and 
examples of types of proofing errors to be found 
were given. Although Ss were limited to 50 minutes, 
no S needed a longer time to complete the task. 


pages 


RESULTS 


Error Detection as 
Density 


a Function of Error 

Figure 1 indicates how the percentage of 
error detection varies as a function of error 
density for the first 10 pages of the manu- 
script. In general, error detection increased 
up to some maximum point, beyond which 
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increments in error density reduced the fre- 
quency of detection. 

The results of an analysis of variance per- 
formed on percentage of error detections are 
shown in Table 1. The analysis yielded an F 
of 9.13 for error density, which was signifi- 
cant beyond the .01 level of confidence. 

Although the manuscript dealt with one 
topic and was treated as a continuous task, 
it was regarded as desirable to examine in- 
dependently the efficiency of error detection 
for the first and second five pages of the 
manuscript; this was feasible because error 
density was constant for these parts of the 
manuscript. This procedure provided infor- 
mation about the stability of error detection 
performance over time. Inspection of the 
curves of error detection as a function of 
error density indicated that no significant 
change in detection efficiency occurred from 
the first to the second part of the manuscript. 
An analysis of variance of the percentage er- 
rors, performed on the first and second parts 
of the manuscript, supported this interpreta- 
tion. 

TABLE 1 
ANALYSIS OF VARIANCE OI 
PERCENTAGE OF ERRORS 
Source of Variance 

Error density 

Within groups 

Total 


* Significant at the .01 level. 
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TABLE 2 


LEVELS OF Error DENSITY AND FALSE DETECTIONS 


Number of False 
Detections 


Error 


Group Density 


I 6 181 
II 9 133 
Ill 15 124 
IV 229 
V 174 
VI 188 


False Detections at Various Error Levels 


The number of false errors reported, as a 
function of error density, is shown in Table 2. 
False detections appear somewhat more fre- 
quently under conditions of high error den- 
sity; however, the relationship between error 
density and false detections does not appear 
to be very systematic. An analysis of vari- 
ance of the data, however, shows that this 
trend is not significant at the .05 level of 
confidence. 

Although false detections may not depend 
upon signal or error density, it is conceiv- 
able that false and correct detections are in 
some way related. To test this hypothesis a 
rank-difference correlation between the fre- 
quency of correct detections and the number 
of false detections was computed; the ob- 
tained rho was 0.49, which was significant 
beyond the .05 level of confidence. 

Type of error and detection efficiency. An 
analysis of proofreading efficiency and types 
of error is summarized in Table 3. The table 
indicates that the most difficult type of error 
to detect was omission, followed by transposi- 
tion and substitution, and that this relation- 
ship remained stable throughout the proof- 
reading task. 


TABLE 3 


PERCENTAGE OF TYPES OF ERRORS DETECTED 


Manuscript Pages 


Error Type 


6-10 
Omission ‘ 744 
Transposition . .776 
Substitution / 889 


11-15 
.720 
.810 
870 


in a Vigilance Task 
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LOG EXPERIENCED 
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Fic. 2. Percentage of errors detected under new 
error density conditions as a function of previously 
experienced error density. New (present) error den- 
sity is employed as a parameter. 
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Influence of Previous Density on Present De- 

tection 

In an attempt to determine if Ss acquire 
some sort of set or expectancy, as a result of 
having 10 pages of proofreading experience 
at a fixed level of error density, the last five 
pages of the continuous manuscript were di- 
vided into three levels of error density. Fig- 
ure 2 shows the percentage of errors detected 
for the last five pages of the manuscript as a 
function of error detection experience during 
the first 10 pages of proofreading. The curves 
suggest that previous experience in detecting 
a large number of errors may have some 
slight facilitating effect in the later detection 
of errors. The results of an analysis of vari- 
ance of the data, however, yielded an F of 
2.71 for previous experience in detection (df 

5,198) which was not significant at .05 
level of confidence. The analysis indicates 
that previous detection experience at varying 
error rates does not differentially influence 
the later detection of errors. Secondly, al- 
though the curves in Fig. 2 suggest that error 
detection efficiency increases with the density 
of errors, statistical analysis of the data does 
not support this interpretation. In contrast, 
there is clear indication that the differential 
effect of error density tends to diminish dur- 
ing the proofing of the last five pages of the 
manuscript. 


False Detections after Shifts in Error Density 
No significant effect of previous experience, 
or error density, was found on the frequency 
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of false detections during the last five pages 
of the manuscript. On the other hand, a rank- 
difference correlation of 0.53 was obtained 
between the frequency of false detections and 
the percentage of correct detections, which 
was significant beyond the .05 level of con- 
fidence. 


DISCUSSION 


The results show that error detection effi- 
ciency in a proofreading task is a curvilinear 
function of the density of the errors in the 
manuscript. This result is in partial agree- 
ment with the work of Deese and Ormond 
(1953) and Weldon et al. (1956). On the 
other hand, neither Deese and Ormond, nor 
Weldon et al., found a significant decrease in 
detection efficiency at high levels of error 
density. Although this effect may be peculiar 
to proofreading tasks, a more likely explana- 
tion would be that all vigilance tasks are sub- 
ject to decreasing efficiency at extremely high 
error levels, and that this extreme level had 
not been met in the studies mentioned. 

Attempts to explain this error density-de- 
tection relationship have centered around 


models employing concepts from conditioning 


(Broadbent, 1953; Holland, 1958; Mack- 
worth, 1950) and expectancy theories ( Deese, 
1955). Both theories would predict an in- 
crease in error detection with error density; 
however, the experiment provides no crucial 
test of either. 

False detections were found not to be a 
function of error density; however, false de- 
tections and correct detections were posi- 
tively correlated. This latter finding was not 
reported by Deese (1955), who found no cor- 
relation between false reports and the prob- 
ability of detection in radar monitoring. It 
may be that proofreading provides many 
more stimuli capable of eliciting false re- 
ports whereas radar monitoring, by the na- 
ture of the task, provides relatively few 
stimuli. 

Shifts in error density, after Ss had ex- 
perience with fixed levels of error, failed to 
produce significant changes in error detec- 
tion. It is possible, of course, that expectan- 
cies are really acquired as a function of 
proofreading at fixed levels of error but that 
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the conditions of this study were too insensi- 
tive to demonstrate them. 


SUMMARY 


This experiment was designed to determine 
the effects of signal density on frequency of 
correct and false detections, to study the 
types of errors detected, and to examine the 
effects of shifts in signal density on error de- 
tection in a proofreading situation. The task 
consisted of a passage from Canada which 
was proofread by 216 Ss. Differential treat- 
ment of the groups consisted of six levels of 
error density: 6, 9, 15, 30, 60, or 120 for the 
first 10 pages of the manuscript; and then 
shifts to error levels of 6, 30, or 120 for the 
last five pages. Ss read the passage as a con- 
tinuous manuscript and were not informed as 
to the error levels they were working under, 
or to the shifts in error density. The basic 
results are as follows: 

1. For the first 10 pages of the manuscript, 
increases in error density produced increases 
in error detection up to the 30-error level. 
Beyond the 30-error level, error detection 
efficiency was reduced. 

2. False reports were found not to be a 
function of the density of the errors; on the 
other hand, the correlation between false de- 
tections and correct responses was 0.49. 

3. The most difficult of the three types of 
error was omission, followed by transposition 
and substitution. 

4. Shifts in error density, after Ss had ex- 
perienced a fixed level of error, produced no 
significant change in the percentage of errors 
detected. 

5. After shifts in error density had oc- 
curred, false detections were found, again, not 
to depend on the new error level. In contrast, 
false reports and correct detections remained 
related, the correlation being 0.53. 


REFERENCES 

BroaADBENT, D. E. Classical conditioning and human 
watch-keeping. Psychol. Rev., 1953, 60, 331-339. 

Deese, J. Some problems in the theory of vigilance. 
Psychol. Rev., 1955, 62, 359-368. 

Derse, J., & Ormonp, E. Studies of detectability 
during continuous visual search. Wright Air De- 
velopment Center, Tech. Rep., WADC-TR-53-8, 
1953. 





Error Density and Set in a Vigilance Task 


Hotianp, J. G. Human vigilance. Science, 1958, 128, 
61-67 

Mackwortu, N. H. Researches on the measurement 
of human performance. Med. Res. Council, Spe- 
cial Rep. No. 268, London: His Maijesty’s Sta- 
tionery Office, 1950. 


Mowerer, O. H. Preparatory set (expectancy)—some 
H. Mowrer (Ed.), 


theory and personality dynamics. New 


methods of measurement. In O 
Learning 
York: Ronald, 1950. 


209 


Kalends, 


Research in proof reading 
7; No. 2, 5-10. 


ScuemwtT, V. P 
1940, 19, No. 1, 

Srecrriep, A. Canada. Trans. by H. H. Hemming 
and Doris Hemming. New York: Harcourt, 1937. 

Wetpon, R. J., Yaruso, Ryoko, & Peterson, G. M. 
Factors influencing dial operation: II. Special-pur- 
pose double-number dials. Sandia Corp., 1956, 
Tech. Rep., SC-3839-TR 


(Received August 21, 1959) 





Journal of Applied Psychology 
1960, Vol. 44, No. 3, 210-215 
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ROLE OF TEACHING* 
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This work began as an investigation of the 
underlying motivations of teachers with the 
expectation that projective measures of inter- 
ests and needs would cast light on why some 
teachers are enthusiastically committed to 
teaching and others are not. It soon became 
clear, however, that the more straightforward 
and possibly more productive approach was 
to look at more manifest motivation. If, by 
means of a questionnaire, a group of stu- 
dent teachers were asked about their attitudes, 
expectations, and commitment to teaching, 
would not more be learned about their mo- 
tivation to teach than by less direct means? 
If nothing else, this direct approach seemed 
like a necessary first step, as Allport (1953) 
has argued. 

A long questionnaire was designed to re- 
veal acceptance of and involvement in teach- 
ing. The objective was to discriminate be- 
tween those who saw themselves as teachers 
and liked this self-perception and those who 
were mildly intrigued with teaching but did 
not identify with classroom teachers. The 
working hypothesis was that the crucial de- 
terminant of how much energy a teacher will 
persistently devote to teaching is the extent 
to which he accepts the role as “propriate”— 
to use a term proposed by Allport to signify 
a life-style which is of “strong personal rele- 
vance” and “central to our sense of existence”’ 
(1955, p. 40). How “propriateness” is evalu- 
ated by a person depends on the interaction 
of three different role perceptions: the S’s per- 
ception of the role in question, his conception 


1In gathering the data for this study, the author 
made extensive use of the facilities of the Teacher 
Education Research Project, sponsored by the Ford 


Foundation Fund for the Advancement of Educa- 
tion at the Harvard Graduate School of Education. 
The author is indebted to Harry Levin, former di- 
rector of the Project, Gordon W. Allport, and John 
B. Carroll for their assistance. 

This paper is based in part on the author’s PhD 
dissertation at Harvard University. For fuller details 
see Hilton (1955). 
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of an ideal or optimum role for himself, and 
his perception of his present role. If the three 
roles are congruent, high role acceptance can 
be predicted. In other words, it is proposed 
that acceptance of a role is not simply a mat- 
ter of its absolute attractiveness. But this pro- 
posal raises more questions than it answers, 
so it will not be pursued further here. It is 
offered only to inform the reader of the theo- 
retical expectations which guided the data 
collection. 
METHOD 


General Procedure 


The first task was to obtain the best available cri- 
terion measure of acceptance of teaching. Then, the 
concomitants and antecedents of high acceptance 
were investigated by obtaining the correlations of a 
large pool of items with this criterion. Third, from 
a selection of the related items a so-called Index of 
Role Acceptance was constructed.? Fourth, the Index 
was cross-validated with a second sample. Finally, 
the predictive validity of the Index was tested by 
means of a measure of permanence-in-teaching ob- 
tained from a follow-up questionnaire. 


Subjects 


The Ss were students of the Graduate School of 
Education of Harvard University who attended the 
testing sessions of the Teacher Education Research 
Project held during the first week of classes in the 
fall of 1953 and fall of 1954. Although attendance 
was officially voluntary, students were given the im- 
pression that everyone was expected to attend. Of 
the total of 275 students enrolled in 1953 and 1954, 
80% both attended the testing sessions and returned 
the long questionnaire which they were allowed to 
take home. 

All Ss were recent graduates of liberal arts col- 
leges. Few had any formal teaching experience or 
courses in education. Eighteen percent were married. 
About one-third were women enrolled in the early 
childhood and elementary school teaching programs, 
one-third were men preparing to teach in secondary 


2 Originally, we spoke of “ego-involvement in a 
role’ and the author’s doctoral dissertation (Hilton, 
1955) used this term. We have since decided, how- 
ever, that the term role acceptance is more appro- 
priate, in part for reasons which will be mentioned 
later. 
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schools, and one-third were women in the secondary 
school program. At the time of testing, none of th 


Ss was formally committed to enter teaching 


The Criterion Ratings 


To establish a criterion, the investigator 
rely on ratings based on each S’s questionnaire re 
sponses concerning how much he had accepted teach- 
ing as an occupation. Primarily for this reason the 
“acceptance” measured is referred to as “alleged ac- 
ceptance.” The written responses were in answer to 
63 open-ended questions of a 16-page questionnaire, 
which was designed to tap every aspect of each stu 
dent’s attitudes and history that might conceivably 
have a bearing on his acceptance of teaching. The 
first question is typical 


chose to 


People have many different reasons for choosing a 
particular vocation. Discuss your early thinking 
about choosing a career and the development of 
your thinking about this. 


Two judges (the author and inde- 
pendently studied each questionnaire and assigned 
the respondent to one of three levels of role accept- 
ance: high (3), doubtful or moderate (2), and low 
(1). A pooling of the ratings provided a rating rang- 
ing from 2 to 6 for each S. This rating might b 
called the Subjective Rating of Alleged Role Accept- 
ance; it will be referred to here as the Rating 


an associate) 


Bases for Ratings 


In many cases, the ratings of the two judges were 
based on unambiguous statements 
acceptance of teaching. Examples: 


High 


of high or low 


I am extremely enthusiastic about the pros- 
pect of teaching, I love children, and I have 
had extensive preparation. 

I really 
devote my 


This is what 
going to 
teacher. 


want to be and I’m 


time to becoming a 


I would like to be a writer but I feel at 
present that I need some sort of economic 
security to carry out this aim 


I have not sufficient interest in teaching 
I don’t want to teach 


I have been led into more advanced study 
of child development because it fascinated 
me. I actually do not believe I would care 
to teach. 


Such statements as the above were accepted only 
when the balance of the questionnaire was not in- 
consistent with them. When there was a reasonable 
doubt concerning the interpretation of a statement 
and no other evidence, the S was rated as “doubtful.” 


Rating Agreement 


For the 1954-55 sample of 122 Ss the percentage 
agreement between the two sets of ratings is 68%, 


of Role of Teaching 
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ie., for 68% of the Ss both judges assigned the S to 
the same level of role acceptance 
ment correlation between the 
judges is .53. With this degree of correlation between 
two sets of each rating being on a 
with only three categories, the investigator felt justi- 
fied in using the Rating to establish high and low 
criterion groups. 


The product-mo- 
ratings by the two 


ratings, scale 


The Index 


After a content analysis of the written answers to 
the 63 open-ended questions of the questionnaire, 
the investigator created 100 items which 
signed to have mutually alternatives and 
to exhaust the relevant information in 
tionnaire. Next, the items 
and the correlation of 
was obtained 


were ce 
exclusive 
each ques- 
scored for each S 
item with the Rating 
Third, the Index was constructed from 
a weighing of 24 items which were most highly re- 
lated to the Rating. Unlike the highly 
Rating, the Index values are objective insofar as no 
judgment was exercised in computing them 

All the steps described so far involved data ob 
tained from the 1954-55 sample. As the fourth step 
the Index was computed for the 98 members of the 
1953-54 sample. The correlation of each item of the 
Index with the total Index for this 
independent sample provided an indication of the 
cross-sample stability of each item 


were 
each 


subjective 


value of the 


Predictive Validity Check 


To ascertain whether the Index as a whole or any 
of its items had any relationship to teaching per- 
formance, the investigator looked at the teaching 
experiences of the 1953-54 sample one year af 
they had completed their graduat 
two years after they had served as Ss for the Proj- 
ect). By means of a follow-up questionnaire, 94 of 
the original 98 Ss of the 1953-54 sample were as- 
signed to one of three broad categories depending 
on the extent to which they were permanently en- 
gaged in teaching 

Roughly speaking, the so-called high-permanence 
group (N=58) are those who taught full-time in 
the academic year and indicated that they 
definitely planned to continue teaching in the fu- 
ture; the doubtful group (N = 27) are those who 
indicated that there was some doubt in their minds 
about teaching in the future; and the low-perma- 
nence group (N = 9) are those who had withdrawn 
from teaching or who indicated that they planned 
to withdraw and probably would not teach in the 
future. The mean Index values of the three perma- 
nency groups were then examined. It was predicted 
that one of the results of alleged high acceptance of 
teaching would be high permanence in teaching 


alter 
training (about 


1954-55 


RESULTS 


Table 1 gives the chi squares of each item 
with the Ratings for the first sample, and 
the total Index values for both samples when 
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43 


45-46 


49-50 


51 


79-80 


86 
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TABLE 1 
Cut SQUARE OF INDEX ITEMS witH RaTINGs (First SAMPLE) AND To INDEX 
(First AND SECOND SAMPLES) 


(Numbers in parentheses indicate weight assigned to response in computing total Index values) 


Ist Sample 


ae - 2nd Sample 
Ratings Index Index 


No other professions considered as seriously as teaching (1) vs. one or 4.06* 3.3 0.00 
more (0) 

Teaching alleged to be excellent resolution (2) vs. satisfactory resolution 3.66" 
(1) vs. reluctant or unsatisfactory resolution (0) 

One person or less influenced decision to teach (1) vs. two or three per- 

sons (0) vs. four or more persons (1) 

Good teaching or teaching experience influenced decision to teach (1) vs 

other experience, mostly aversive (0) 

Specific teaching job wanted five years hence (2) vs. administrative work 

or marriage (1) vs. other jobs or undecided (0) 

No particular geographical location preferred (2) vs. other preferences 

(1) vs. particular spot or socioeconomic status of spot vital (0) 

Less than $3500 adequate as starting salary or undecided (1) vs. $3500 

or more considered adequate (0) 

Less than $3500 adequate 5-6 years hence (2) vs. $3500 to $4499 (1) vs 

$4500 or more (0) 

Would not be unhappy if nonpreferred subject taught (1) vs. would be 
unhappy (0) 

Will be good teacher because very interested (2) vs. confident of ability 

or qualities (1) vs. do not expect to be good teacher (0) 

No other occupations provide similar satisfactions (2) vs. some other 
occupations (1) vs. clinical, personnel or creative work (0) 

“More opportunity elsewhere” not possible reason for leaving education 

(1) vs. would leave if more opportunity elsewhere (0) 

Less than two reasons given for possibly leaving education (1) vs. two 

or more reasons (0) 

Would not leave teaching for high salary elsewhere (2) vs. possibly but 

not likely (1) vs. would do so or uncertain (0) 

Would not leave teaching regardless of salary elsewhere or salary would 

have to be more than $9000 (1) vs. undecided or would for less than 

$9000 (0) 

Parents’ wishes very important (2) vs. fairly or quite so (1) vs. unim 

portant (0) 

Other members of family besides mother and father approve of career 

choice (1) vs. some disapproval or apathy (0) 

Friends approve of career choice (1) vs. some disapproval or apathy (0) 

Former teachers approve of career choice (1) vs. some disapproval or 

apathy (0) 

Steady wages, good hours, etc., not cited as advantages of teaching (1) 

vs. steady wages, good hours, etc., mentioned (0 

Personal restrictions not cited as disadvantage of teaching (1) vs. per 7 +.89* 
sonal restrictions mentioned (0) 

Thirty or more words written in answer to question asking what educa 5.2 6.06* 
tion should accomplish in the future (1) vs. less than 30 words (0) 

To be of service is very important (1) vs. quite important or qualifi 7 6.84* 9.08* 
cations (0) 


Intellectual stimulation expected (1) vs. qualifications (0) ee 1 23* 5.71* 


* Significant at the .05 level of significance. 
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TABLE 2 


MEAN AND STANDARD DEVIATION O] 


Doubtful 


Program 


Elem. and E.C.: 
Men 
Women 
Secondary 
Men 


Women 10 


Total 5 wW 3 27 
Analysis of Variance b 


SS if 


Source a 


Perm 
Sex 


116.37 
; 78.11 
Perm.-Sex 94.80 


Residual 893.15 


1182.43 


the samples were divided into high and low 
groups. The weight assigned to each response 
of each item is also indicated. For the first 
sample, each item is positively correlated with 
the total Index values. The odd-even reliabil- 
ity of the Index, corrected by means of the 
familiar Spearman-Brown formula, is .65; the 
mean is 18.1, and the standard deviation 4.2. 
The Index values of the first sample have a 
product-moment correlation of .66 with the 
pooled judges’ Ratings, 

For the first sample, 21 of the 24 items are 
significantly related to the total Index values, 
and for the second sample 13 are so related.‘ 
For both samples all correlations are in the 
expected direction. 

8 When the first sample was divided on the 
ings, the high group (N - 
6 and the low group (N 
When both the first and 
divided on the Index values 
Index values of 19 or more 
Index values of 18 or less 


Rat- 
60) had a Rating of 5 or 
62) a Rating of 4 or less 
the sect 
the 
and 


nd samples were 
high groups had 
the low group, 


4 These correlations are, of course, to some extent 
spurious since the total Index value was derived 
from a weighting of the items themselves, but with 
24 items and a fairly homogeneous scale it is highly 
unlikely that the correlations are entirely spurious. 


INDEX FOR 


DIFFERENT PERMANENCY GROUPS 


Permanency 


Potal 


? 
9 


Permanency ar 


MS 


58.18 
78.11 
47.40 
10.15 


Predictive Validity of Index 


Table 2 gives the mean and standard de- 
viation of the Index for the men and women 
of each permanency category of each teacher 
training group. As expected, the mean Index 
values generally are greater for the higher 
permanency groups. An analysis of the vari- 
ance of the means indicates that the overall 
means of the three permanency groups are 
different and, in addition, that there is a sig- 
nificant interaction of permanency and sex. 
Apparently, the Index significantly foretells 
the permanency of the women but not of the 
men. The analysis also indicates that the 
mean for the women is higher than the mean 
for the men. The product-moment correlation 
between alleged acceptance and permanency 
is .23 for the total group, varying from .02 
for the secondary men to .35 for the elemen- 
tary women. 

As for the correlation of the separate items 


of the Index with permanency, only a few 
scattered items are significantly related. By 
and large the items are related in the ex- 
pected way. Item 43 (will be good teacher 
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because very interested) is the most highly 
correlated item. There are, however, some 
striking reversals. Item 41 (willing to teach 
nonpreferred Ss), which was scored as an in- 
dicant of high acceptance, was found to be 
negatively related to permanency. Also Item 
72 (favorableness of friends’ opinions of ca- 
reer choice) has a significant negative cor- 
relation. 
DISCUSSION 


Any evaluation of the results depends on 
which measures are accepted as valid cri- 
teria. Let us assume then that the Ratings 
served only as a means of screening a mass 
of information. Therefore, no conclusions will 
be drawn from the content of the items which 
were found to correlate with the judges’ 
ratings. 

Certain items, however, correlated not only 
with the Ratings of the first sample but also 
with the Index values of both samples. There 
are, then, attitudes and events which differ- 
entiate a group of Ss who allege high accept- 
ance from a group of Ss who do not. The in- 
vestigator feared that alleged role acceptance 
might prove to be such a complex state, so 
unique to each individual, that no such com- 
mon attitudes and events would be found. 
Evidently, this is not the case. 

Furthermore, the discriminating items ap- 
pear to be consistent with the conception of 
role acceptance proposed earlier. For exam- 
ple, a larger proportion of the highs say that 
no other occupations would provide satisfac- 
tions similar to those they expect to find in 
teaching (Item 45-46). Also, more of the 
highs cite less than two reasons for possibly 
leaving education (Item 51). From these two 
items, it might be concluded that the highs 
perceive less discrepancy between teaching 
and their conception of an ideal role. 

Another series of items suggest hypotheses 
in regard to the rewards which “high ac- 
ceptors” anticipate. They appear to place less 
emphasis on what might be called extraneous 
rewards, e.g., steady «zages, salary, vacations, 
as opposed to satisfactions which are more 
nearly unique to teaching, e.g., working with 
children, being of service, intellectual stimu- 
lation. The approval of significant figures in 
the S’s environment also appears to play an 
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important part in the alleged role accept- 
ance. 


Validity 


The adjective alleged implies that the true 
covert acceptance of some of the Ss may have 
been different from what they reported, that 
some Ss may have deliberately misrepresented 
their real acceptance of teaching. One test of 
this possibility was provided by the follow-up 
study. These results showed that the Index 
has a small predictive validity, but only for 
the women in the sample. 

Even this low predictive power is in one 
sense important, since a large battery of con- 
ventional measures used in a companion 
study conducted by the Teacher Education 
Research Project produced only scattered and 
inconclusive results (Levin, Hilton, & Leider- 
man, 1957). The appropriate keys of the 
Strong Vocational Interest Blank, which was 
included in the battery, provided no signifi- 
cant predictions. Furthermore, the differences 
in permanency among the Ss were slight. 
Only nine out of 94 Ss had clearly rejected 
teaching as a career. The 27 in the doubtful 
category were placed there because of stated 
doubts concerning their future commitment 
to teaching. 

The fact remains, however, that the pre- 
diction was not strong. It may be that con- 
tinued follow-up of these Ss will provide 
stronger discrimination. Nevertheless, for the 
time being, one or more of the following pos- 
sibilities must be entertained: (a) the verbal 
statements of people do not accurately re- 
flect their true attitude towards occupational 
roles, (6) role acceptance is not stable over 
two-year periods, (c) accurate prediction from 
role acceptance without consideration of other 
personal characteristics, e.g., abilities, is not 
possible, (d) environmental barriers and con- 
straints so often prevent people from enact- 
ing their preferred occupational role that a 
prediction from a consideration of role ac- 
ceptance alone is not feasible. Accurate pre- 
diction may well require a scale designed to 
measure the likelihood of the S encountering 
situational factors beyond his control, e.g., 
racial or religious prejudice, legal restrictions, 
geographical location, family demands, eco- 
nomic pressures. 
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SUMMARY 


This research was undertaken to investi- 
gate a state of affairs which was termed 
alleged acceptance of the role of teaching. It 
might also be described as high interest or 
ego involvement in teaching or high embrace- 
ment of teaching or high identification with 
teachers, although each of these terms has 
shortcomings. This variable was selected as 
likely to be the best indicant of each S’s 
readiness to devote his energies to classroom 
teaching. 

On the basis of written answers to the 
open-ended questions of a long questionnaire, 
the extent to which 122 student teachers ap- 
peared to have accepted teaching as a role 
was rated by two judges. The correlations 
with the judges’ ratings of a collection of 
items derived from the questionnaire re- 
sponses pointed to certain attitudes and 
events common to those with high ratings. 
From the most highly related items, an Index 
was constructed and then cross-validated with 
an independent sample. The content of the 
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surviving items appeared to be consistent 
with the investigator’s conception of role ac- 
ceptance. 

The Index accounted for a small share of 
the differing permanence in teaching of the 
women teachers of the sample, at the end of 
their first year of teaching. But in the ab- 
sence of long-term criteria, the investigator 
will continue to refer to the role acceptance 
measured as alleged role acceptance. 
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In constructing criteria of job performance, 
psychologists have become increasingly cog- 
nizant of the need to cover all aspects of the 
job. This point of view has been expressed 
quite well by Ghiselli’s (1956) position that 
criteria are multidimensional, that the dimen- 
sions are unlikely to be equally important and 
that the dimensions should be differentially 
weighted by some method that does not as- 
sume a general factor of success. 

The multidimensionality approach to cri- 
terion development was employed in this 
study through the factor analysis of 20 meas- 
ures of foreman performance. The purposes 
of this research were (a) to identify criterion 
factors or dimensions that could be used sepa- 
rately as subcriteria, (b) to devise relevance 
weights that could be used to combine the 
factors into a composite criterion, (c) to de- 
termine which criterion factors were inade- 
quately covered by the measures used in the 
study, and (d) to examine the consistency of 
the factor structure in two plants. The prob- 
lem of developing factor score equations for 
reproducing criterion dimensions will not be 
considered in this paper. 

It should be emphasized that there is no 
assurance that factor analysis will reveal all 
dimensions of performance unless an exhaus- 
tive number of measures have been obtained 
on all possible aspects of the job. The most 
obvious omission in this study is an objective 
measure of quality, which was not available. 


1 Except for the relevance weighting procedure, the 
research reported in this paper is part of a PhD 
thesis submitted to the faculty of Purdue University. 
The author would like to express his appreciation 
for the guidance and assistance provided by the 
thesis committee, composed of Joseph Tiffin, Chair- 
man, E. J. McCormick, B. J. Winer, and W. V 
Owen. The author is also indebted to Orlo L. Cris- 
sey, Administrative Chairman of Personnel Evalua- 
tion Services, General Motors Institute, for making 
the data available and for contributing staff and 
clerical time to expedite the processing of the data 

2 Now with the B. F. Goodrich Company. 
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Nevertheless, the measures utilized here are 
fairly representative of those typically avail- 
able for production foremen. 


PROCEDURE 
Sample 


The sample consisted of production foremen (first 
line supervisors) in two plants of an automobile as- 
sembly division of a large corporation. The two 
plants perform identical production operations and 
are located in metropolitan areas of the Midwest 
(Plant X) and East (Plant Y). 

In order to insure a nominal level of experience 
on the part of the foremen used in the study, only 
those foremen were included in the sample who had 
at least two monthly scores on one or more of the 
criterion measures. This provided n’s of 102 in Plant 
X and 104 in Plant Y. Some of these foremen had 
only one score on certain variables due to vacations, 
different collection periods for the various measures, 
and other vagaries of the data collection process. 


Objective Measures 


Eleven objective measures were constructed from 
information supplied by the plants at weekly or 
monthly intervals. The collection period for the ob- 
jective measures extended from December 1956 to 
May 1957. The actual number of months for which 
data were available for any one measure ranged 
from three to six, varying for different criteria and 
for the two plants. 

The objective measures were Grievances, Turnover 
(voluntary quits), Absences, Suggestions, Hospital 
Passes (occupational injuries), Disciplines, Absentee 
Flexibility (hours spent by the foreman’s Utility 
Trainer in training men on jobs), Scrap, Expense 
Tools, Expense Processing Supplies, and Efficiency 
There was no objective quality data that could be 
attributed to specific foremen with any degree of 
certainty. All of the objective measures were based 
on the performance of a foreman’s section as an op- 
erating unit. Absences, for example, referred to ab- 
sences of hourly employees, not foreman absences 

In calculating monthly values for the first six 
measures (see Table 1), frequency counts for a fore- 
man’s section were divided by the number of man- 
days worked during the time period to-adjust for 
differences in section size. The Grievances index was 
multiplied by the number of employees submitting 
grievances to reduce the influence of chronic grievers 
and to increase the index value when dissatisfaction 
was more widespread. The Scrap index was merely 
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the scrap cost in dollar units. Index values for Ex- 
pense Tools and Expense Processing Supplies were 
dollar costs in excess of budget. Efficiency index 
values were calculated by the Work Standards De 
partment; although the method of computing it 
differed in the two plants, Efficiency was intended 
to be a measure of a section’s performance relative 
to standard time allowances. 


Ratings 


During the six months from January through June 
of 1957, general foremen provided monthly ratings 
of the foremen reporting to them. The ratings con- 
sisted of alternating rankings of foremen on Overall 
Performance and eight functional areas of job per- 
formance, giving nine separate measures. The eight 
areas were Quantity, Quality, Cost Control, Organi 
zation and Planning, Employee Relations, Coopera 
tion with Other Supervision, Safety, and Housekeep- 
ing. Responsibilities for each area were defined on 
the rating form, with the definitions abstracted from 
a divisionwide job description for production fore- 
The Overall Performance Rating was to in- 
clude performance on the eight areas plus any 
functions that the rater thought important 


men 


other 


Score Transformations 

Original scores on the objective variables were 
transformed to normal distributions because scores 
on most of the measures were highly skewed, piling 
up at the low end of the scales where there was a 
lower limit of zero. It was felt that the abilities un- 
derlying the criterion scores could be approximated 
more closely by the traditional normal curve 

In addition to non-normal distributions, scores on 
the objective measures were subject to biases in the 
form of score differences attributable to conditions 
inherent in different plants, departments, shifts and 
months. Because of the possibility of biases, adjust- 
ments were made to equalize monthly 
‘ within each plant-department-shift unit. This adjust- 
ment provided a set of scores generally representa- 
tive of scores unbiased by plants, departments, shifts 
or months. 

Score transformations and adjustments on the ob- 
jective measures were made by ranking the origi 
nal monthly index values within organizational sub- 
groups, then converting the ranks to normalized 
scores (JT scores) with a mean of 50 and a stand- 
ard deviation of 10 (Edwards, 1954, p. 512). In 
ranking the index values, a rank of 1 always indi- 
cated the best performance. On Absentee Flexibility, 
Suggestions, and Plant Y’s Efficiency, better per- 
formance was interpreted as higher scores. On the 
remaining objective measures, lower 
taken as better performance 

The ratings, received from general foremen as 
monthly rankings, were also converted to 7 scores, 
which normalized rating scores within general fore- 
men’s areas and equalized score means from general 
foreman to general foreman, Since a general fore- 


score means 


scores 


were 


man’s area was smaller than a department, rating 
score means were unbiased (equalized) for months, 
plants, departments, and shifts, as well as for gen- 
eral foremen. 

Besides his monthly 7 foreman was 
given an overall 7 score on every measure by calcu- 
lating mean 7 scores for all foremen, converting the 
mean T scores to percentile distributions within each 
plant and transforming the plantwide percentiles for 
each criterion variable to T scores (Edwards, 1954, 
p. 511). For those foremen having only one score 
on a measure, the single score rather than a mean 
was used in the above procedure 

It should be remembered that although the 2( 
variables were collected from December through 
June of one product model year, none of the meas- 
ures were obtained for all seven months. This cre- 
ated no difficulties in assessing the reliability of 
monthly However, in intercorrelating th 
variables there was a choice of either using only the 
months in common to all of the variables, thereby 
losing a major part of the sample, or using all of 
the data available for each measure to get a 
average score representing a foreman’s standing on 
that measure. The latter alternative was chosen be- 
cause it was felt that more dependable and repre- 
sentative results would be found by using all of the 
data. 


scores, each 


scores 


single, 


Reliabilities 


The monthly 7 scores were used in estimating the 
reliability of the measures across time. Only thos 
foremen receiving two or more monthly scores on a 
criterion could be used in the reliability estimates 
The number of monthly scores on any measure were 
not the same for all foremen. Reliability estimates 
were computed from Ebel’s (1951) intraclass analy 
sis of variance method for incomplete sets of scores 
(unequal observations from person to person). 

Reliabilities were calculated for both single 
and mean scores for ky months (ko being an ap- 
proximation of the harmonic mean of the number 
of scores for each foreman). Ebel’s method of de- 
termining the reliability of an average score is 
equivalent to stepping-up the reliability of a single 
score ko times in the Spearman-Brown formula. The 
Spearman-Brown formula was used to estimate the 
reliability of mean scores for three, six, and nine 
months. The purpose of these latter reliabilities was 
to gain information regarding the adequacy of re- 
liabilities of criterion data collected for varying num 
bers of months. All reliability estimates were mad 
separately for the two plants. 


scores 


Factor Analyses 


The overall 7 scores for each foreman on the cri- 
terion variables were used in obtaining the intercor- 
relations of the 20 measures. Intercorrelation matrices 
for Plant X and Plant Y were calculated on an IBM 
650 computer. 

Factor analyses for the two intercorrelation ma- 
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trices were run on Purdue’s Datatron computer using 
principal components solutions. Communality esti- 
mates were the squared multiple correlations of each 
variable with the other 19 variables. The computer 
extracted 20 factors from each matrix. Factors were 
selected according to the size of their latent roots; 
enough factors were retained to make the sum of 
their latent roots roughly equal to the sum of the 
communality estimates. In Plant X the sum of the 
communalities was 9.15 and the sum of the five 
largest latent roots was 9.16. For Plant Y the sum 
of the first five latent roots was 9.01 compared to 
8.92 as the sum of the communality estimates. A 
high degree of correspondence between the estimated 
communalities and the obtained communalities indi- 
cated that the common-factor variance in the inter- 
correlation matrices was sufficiently accounted for by 
the retained factors. 

The resulting factors were rotated graphically and 
orthogonally to approximate simple structure. 


Relevance Weights 
S 


For the purposes of this research, a criterion was 
defined as a yardstick for evaluating a foreman’s 
contribution to the success of organizational opera- 
tions. Some estimate of relevance was desired which 
would indicate the relative importance of the cri- 
terion dimensions to the success of the organization. 
Basing criterion relevance values on either factor 
loadings or the proportion of common-factor vari- 
ance accounted for by a factor was not justified be- 
cause the domain of total criterion performance was 
not necessarily completely covered nor proportionally 
sampled by the 20 measures. Criterion relevance was 
established from the judgments of production de- 
partment superintendents, who were one supervisory 
level higher than the general foremen. 

Eight superintendents, four from each plant, were 
asked (a) to rank the criterion measures according 
to the extent to which they would reflect a fore- 
man’s contribution to the success of their depart- 
ment’s operations and (b) to identify measures that 
they thought were completely irrelevant. The names 
and descriptions of the criterion measures were typed 
on separate cards, which were sorted into four decks 
In individual sessions the superintendents ranked the 
cards in the four decks and collated the decks into 
an overall rank order. 

No superintendent named any measure as being 
nonrelevant; consequently, it was assumed that all 
measures embodied some relevance to the effective- 
ness of production operations within a department. 
The criterion measures were given relevance scores 
from each rater by reversing the rank order (the 
highest ranked variable was given a score of 20, the 
second highest was given a score of 19, etc.). Inter- 
rater agreement reliabilities were computed for mean 
relevance scores from the superintendents within 
each plant (.90 in Plant X and .82 in Plant Y) and 
both plants (.92) by using Ebel’s (1951) 
method for finding the reliability of complete sets 
of ratings. As a direct check on the agreement be- 
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tween plants, a rank order correlation between mean 
criterion relevance scores within the two plants was 
computed. The between-plant rank order correlation 
was .74. Because of this rank order correlation and 
the high reliability for average scores across eight 
raters, it was decided to use a single relevance score 
applicable to both plants for each criterion variable. 
A measure’s final relevance score was the mean score 
from all eight superintendents. 

The final relevance score for each criterion meas- 
ure was distributed among the meaningful factors in 
proportion to the measure’s factor loadings of .20 
and higher (an r of .20 with 100 df is significant at 
the .05 level). One example of the distribution of 
relevance scores to factors is a measure whose two 
factor loadings above .20 were .34 and .44: the rele- 
vance score was divided between the two factors in 
the ratio of 3 to 4. Other allocations of relevance 
scores were made in a similar manner. If a variable 
had only one factor loading as high as .20, all of 
the relevance score was given to that factor. All cri- 
terion measures had at least one factor loading of 
20 or higher, making it possible to assign all of the 
relevance scores to the factors. 

The criterion relevance values assigned to the fac- 
tors were given the signs of the corresponding factor 
loadings and summed algebraically for each factor 
The total factor relevance values were divided by 
the number of variables contributing relevance to the 
factors; this step yielded a mean relevance weight 
for every factor. A factor’s mean relevance weight 
would be more representative of that factor’s rele- 
vance than a total value because the latter would 
be more dependent on the number of different kinds 
of variables included in the factor analysis. The av- 
erage factor relevance weights were rounded to the 
nearest integer to serve as estimates of factor rele- 
vance. 

The relevance weighting procedure was repeated 
using two other approaches: once by getting sepa- 
rate factor relevance values for the two plants from 
relevance judgments pooled within plants instead of 
across plants, and again by converting relevance 
ranks to T scores and assigning to the factors only 
that portion of a criterion’s relevance JT score ac- 
counted for by its factor loadings of .20 or higher. 
All three relevance weighting procedures gave virtu 
ally identical results, so only one method is explained 
here. 


RESULTS 

Reliabilities 

Reliabilities for single monthly scores on 
the objective measures were quite low. In 
Plant X they ranged from .03 to .59 with a 
median of .35; in Plant Y they ranged from 
.07 to .65 with a median of .27. Reliabilities 
of single monthly scores on the ratings were 
higher, ranging from .46 to .69 in Plant X 
and from .39 to .66 in Plant Y. 
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When the reliabilities were stepped-up by the 
Spearman-Brown formula for average scores 
taken across three, six, and nine months, it 
was found that satisfactory reliability could 
be obtained for most of the measures by tak- 
ing averages of several monthly scores. Nearly 
all of the rating scales in both plants reached 
a reliability of .70 or higher for an average 
of three monthly scores. Four measures— 
Grievances, Turnover, Suggestions, and Disci- 
plines—failed to attain a reliability of .70 in 
either plant for mean scores from as many as 
nine months; this was also true for Absentee 
Flexibility in Plant X. 

Table 1 gives rz:, the reliability of mean 
scores from ky months, for the 20 criterion 
measures. This rzz is the reliability of mean 
overall scores, which were used in computing 
the intercorrelations, for those foremen hav- 
ing two or more scores. Intercorrelations and 
factor analyses were based on the total plant 
n’s of 102 and 104, but the reliability n’s 
were smaller. The difference between the total 
plant n’s and the reliability m’s shown in 
Table 1 is the number of foremen who had 
only one monthly score on a measure. Al- 
though rzz is not the actual reliability of all 
intercorrelated scores on those variables for 
which some foremen had only one score, it 
can be regarded as a close approximation. 


Intercorrelations and Factor Analyses 


The intercorrelation matrices and principal 
component factor loadings for the two plants 
can be found elsewhere (Turner, 1959), so 
they are not given here. However, some as- 
pects of the intercorrelations bear mention- 
ing. The objective measures in Plant X and 
Plant Y had low correlations with other ob- 
jective measures and with the ratings; nearly 
all of these significant r’s were in the .20’s. 
Low correlations resulted in low communali- 
ties for the objective measures. Intercorrela- 
tions among the rating scales were fairly high 
(generally in the .50’s, .60’s, and .70’s). 

The intercorrelations were examined to de- 
termine if objective measures were correlated 
with ratings that might logically be expected 
to cover similar kinds of job achievement. 
The objective cost measures—Scrap, Expense 
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Tools, and 


Expense Processing Supplies— 
were not significantly correlated with Cost 
Control Ratings in either plant. The Em- 
ployee Relations Rating in Plant Y was sig- 
nificantly correlated (.21) with only one (Ab- 
sences) of the six (Variables 1 through 6) 
personnel measures, while Plant X’s Em- 
ployee Relations Ratings were significantly 
correlated with Grievances (.21) and Hos- 
pital Passes (.26). In both plants the objec- 
tive Efficiency index had near-zero r’s with 
Quantity and Quality Ratings. It is obvious 
that ratings and objective data are not neces- 
sarily equivalent, even when they supposedly 
measure similar things. 

The rotated factor loadings are shown sepa- 
rately for Plant X and Plant Y in Table 1. 
Four interpretable factors were found for 
each rotated factor matrix. The fifth factors, 
having only one loading above .30, appear 
to be residual factors. Factors I and II are 
the same in both plants, but the third and 
fourth factors have plant differences. 

Dimension I: Job performance reputation. 
Factor I is a rating dimension on which all 
of the ratings have high loadings. Factor I 
accounts for nearly all of the rating com- 
munalities and is anchored by the Overall 
Performance Rating. None of the objective 
variables have loadings of any consequence 
on Factor I. Factor I “cross-validated’’ quite 
well from Plant X to Plant Y. 

The Overall Performance Rating, with Fac- 
tor I loadings of .95 and zero or near-zero 
loadings on other factors, is sufficient to 
identify this dimension. It appears that ad- 
ditional ratings are superfluous. 

The low or negligible loadings of the rat- 
ings on the remaining factors is a further in- 
dication that ratings and objective data are 
far from interchangeable. Ratings of foreman 
performance seem to be determined primarily 
by a general reputation for job performance. 
This reputation is not necessarily completely 
divorced from a foreman’s actual contribu- 
tion to the organization. Nevertheless, the 
low relationship of ratings with other factors 
and with individual objective measures sug- 
gests the possibility that some irrelevant con- 
siderations might affect job 
reputation. 


performance 
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Dimension 11: Employee relations. Factor 
II in both plants is characterized by few ab- 
sences, few disciplines, and a tendency to 
stay within the budget allotment for expense 
tools. Of the six objective personnel measures, 
Hospital Passes and Suggestions do not have 
consistent positive correlations with this fac- 
tor. Grievances and Turnover have low to 
high positive loadings in the two plants. Since 
Factor II seems to be marked by a good re- 
lationship between foremen and their sub- 
ordinates, it is named Employee Relations. 

Dimension III: Scrap vs. organization of 
production operations. The plant-to-plant 
similarities on the third factor are high posi- 
tive loadings for Scrap, negative loadings for 
Suggestions and evidence of bipolarity. 

In Plant X a good showing on Scrap is ac- 
companied by a poor record in Grievances 
and a little difficulty in meeting production 
standards (Efficiency). The negative loadings 
for Variables 1, 4, 5, and 11 cause one to 
suspect the existence of bipolarity in Plant 
X’s third factor, although the nature of the 
bipolarity is unclear. 

The bipolarity is clarified somewhat in 
Plant Y by moderate negative loadings for 
Suggestions, Absentee Flexibility, and Ex- 
pense Processing Supplies. Foremen in Plant 
Y who do well on Scrap tend to have fewer 
suggestions from subordinates, provide less 
on-the-job training for employees and exceed 


budget allowances for processing supplies. 


High performance on Factor III appears to 
be made at the expense of activities that, for 
lack of a better term, might be referred to as 
organization of production operations. The 
specific areas on which competence is dimin- 
ished differ in the two plants. Something 
other than Scrap performance that is not 
measured by the 20 variables could also be 
tied to the high end of Factor III. 

Dimension IV. This is another factor for 
which plant differences exist, but there are 
some similarities from Plant X to Plant Y. 
The number of negative loadings, although 
most are small, suggest the presence of bi- 
polarity in the two plants. Turnover has mod- 
erate positive loadings and Expense Process- 
ing Supplies has low negative loadings in 
both plants. The most noticeable plant dif- 


ferences are the sizes of the loadings for Effi- 
ciency (.25 in Plant X and .47 in Plant Y) 
and Absentee Flexibility (—.44 in Plant X 
and —.20 in Plant Y). 

High performance on Plant X’s Factor IV 
consists of providing little on-the-job training 
for employees, less turnover, and a somewhat 
better-than-average ability to produce within 
work standards estimates. Plant X’s Factor 
IV might tentatively be called flexibility vs. 
stability in job assignments. 

Factor IV in Plant Y is typified by pro- 
ducing jobs in less than standard time, re- 
ceiving relatively more suggestions from sub- 
ordinates and experiencing fewer voluntary 
quits among hourly employees. The high end 
might be called productivity or smoothness 
of production operations, but the opposite 
pole is poorly defined by the small negative 
loadings. 


Factor Relevance Weights 


The integral 
be seen in 


factor relevance weights can 
Table 2. These weights can be in- 
terpreted as approximate indexes of the rela- 
tive importance of the four factors. 
Superintendents had more confidence in the 
general foremen’s ratings than in objective 
measures; Factor I, the rating dimension, is 
about twice as relevant as Factor II. The 
first two factors were the same in both plants, 
so similar relevance weights would be expected 
for Factors I and II in Plants X and Y. It is 
surprising, however, to find that the relevance 
values for Factors III ‘and IV consistently 
round to zero in spite of their plant differ- 
The zero relevance 


ences. weights were ob- 


tained because positive and negative loadings 


rABLE 2 
FACTOR RELEVANCE WEIGHTS DERIVED FROM 


JUDGMENTS OF SUPERINTENDENTS 


Relevance Weights 


Factor Plant X Plant Y 
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on the bipolar factors are counterbalanced in 
terms of judged relevance. This does not 
mean that high performance on Factors III 
and IV contributes nothing to the success of 
production operations, but that high factor 
performance is offset by poor performance 
on the negatively loaded variables. In other 
words, a low score would be as good as a high 
score on the third and fourth factors. 

The relevance weights shown in Table 2 
could be used in combining factors into a 
composite criterion. Because of the failure to 
establish a preferred end for the bipolar third 
and fourth factors, these two factors should 
be omitted from a composite criterion. 


CONCLUSIONS AND DISCUSSION 


Several rather general conclusions can be 
made for both plants from the results of this 
study, and it is the writer’s hypothesis that 
the same conclusions are applicable to other 
plants in the assembly division. The results 
also have implications for criterion develop- 
ment that might be undertaken in other or- 
ganizations. 

Single monthly scores on criterion meas- 
ures tend to have inadequate reliability across 
time. Averages of several monthly scores are 
needed to attain a satisfactory level of reli- 
ability. Reliability across time would seem 
important if a criterion score is to be a 
dependable index of an individual’s standing 
on a measure. A person conducting research 
should not be surprised if he finds it neces- 
sary to collect criterion data for 
months. 


several 


There is little relationship between objec- 
tive data and ratings which purportedly cover 
similar job areas. The equivalence of ratings 
and objective criterion measures should never 
be assumed. Moreover, the nine ratings used 
in this study represent a single criterion di- 
mension which is almost completely defined 
by the Overall Performance Rating. This fac- 
tor is comparable to “general” factors found 
by others from factor analyses of ratings 
(Creager & Harding, 1958; Grant, 1955). It 
might be argued that supervisory opinion is 
a realistic, legitimate criterion. Nevertheless, 
one might question whether ratings reflect a 
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person’s contribution to organizational effec- 
tiveness when ratings can be shown to be un- 
related to relevant objective records. 

The first two dimensions, Job Performance 
Reputation and Employee Relations, are the 
same in Plant X and Plant Y, and their coun- 
terparts have been identified for foremen in 
other companies (Creager & Harding, 1958). 
The third and fourth dimensions, which to- 
gether comprise the cost-production segment 
of criterion performance, have plant differ- 
ences in factor content, but they are consist- 
ently bipolar in factor structure and rele- 
vance. Additional objective measures directly 
related to production activities are needed to 
clarify the content of Dimensions III and IV; 
an objective measure of quality would be es- 
pecially desirable. 

The bipolarity of Dimensions III and IV 
is indicative of a loss-gain compensation in 
which good performance on some aspects of 
the job is accompanied by diminished pro- 
ficiency on equally important areas. It ap- 
pears that there is more than one pattern of 
foreman success and that it may be unre- 
alistic to expect foremen to do well on all as- 
pects of the job. 


SUMMARY 

Twenty criterion variables, 9 ratings and 
11 objective measures, were collected for pro- 
duction foremen in two automotive assem- 
bly plants. Four meaningful dimensions were 
identified by factor analyzing the measures 
separately for each plant. Relevance weights 
for the dimensions were derived from super- 
intendents’ relevance rankings of the 20 
variables. 

The first two dimensions are the same in 
the two plants. Dimension I, Job Perform- 
ance Reputation, is a rating factor. Dimen- 
sion II, Employee Relations, represents ob- 
jective personnel measures. The third and 
fourth dimensions, consisting primarily of ob- 
jective cost and production measures, have 
plant differences in their specific factor con- 
tent and are incompletely covered by the 
available measures. However, Dimensions III 
and IV are consistent from plant to plant in 
the bipolarity of both their factor structure 
and their relevance weights. 
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A number of studies (Asch, 1951; Laughlin, 
1954; Mann, 1957) have pointed to the ef- 
fectiveness of group discussion and nondirec- 
tive procedures in promoting attitude change 
and improved emotional adjustment. These 
findings are further supported by Di Vesta’s 
(1954) evaluation of a human relations course 
for military hospital administrators. This lat- 
ter study is unique, however, in that both di- 
rective (lecture) and nondirective approaches 
to the subject matter seemed to produce simi- 
lar positive effects. There is the suggestion 
here that nondirective procedures do not con- 
stitute a necessary condition for change. Ap- 
parently, training courses based on the lec- 
ture method can also have an effect on the 
attitudes of a group. 

An opportunity to investigate this conclu- 
sion further was presented to the author when 
he was asked to conduct a course in psychol- 
ogy for 72 supervisors in the Research and 
Development Department of a large corpora- 
tion. Management appraisals carried out in 
the department had pointed to a situation 
which is apparently not uncommon in re- 
search groups of this type. A number of 
the supervisors had achieved their positions 
largely as a result of scientific accomplish- 
ments. Lacking interest in supervisory work, 
many had continued to emphasize individual 
research at the expense of supervisory re- 
sponsibility. As a result, their subordinates 
were frequently given only a bare minimum 
of guidance and the general level of perform- 
ance in several of the research groups was 
below a satisfactory level. It was hoped that 
a course in psychology dealing with various 
facets of supervision might serve to foster a 
more favorable attitude toward supervisory 
work among these scientists, many of whom 
held graduate degrees in chemistry and en- 
gineering. 
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s Administration, University of Oregon 


THE CouRSE 


Training was carried out during 10 one- 
and-a-half-hour sessions given at weekly in- 
tervals. There were four groups. However, 
the membership of these groups tended to 
fluctuate somewhat as men shifted from one 
to another in accordance with the demands 
of their work. Participation in the course was 
not on a strictly voluntary basis. Most of the 
72 men were asked to attend and attendance 
was recorded. The average man failed for one 
reason or another to participate in one of the 
10 sesssions. There was no reading material 
either required or recommended and no ex- 
aminations covering the content of the course 
were administered. 

Although primarily a lecture course the 
groups were small enough so that discussion 
was facilitated. The interaction was, however, 
almost exclusively between instructor and su- 
pervisor; rarely between one supervisor and 
another. The majority of these discussions 
dealt with questions directed to the instruc- 
tor, questions aimed at clarifying points made 
during the lectures and obtaining information 
relevant to a specific situation the supervisor 
had faced. In addition, the instructor spent 
an average of at least a half-hour after each 
session discussing specific problems with indi- 
viduals and smaller groups. 

The lecture content was focused on the 
various reasons why a man might fail to per- 
form effectively in the work situation. In this 
respect it followed closely the category scheme 
detailed in Breakdown and Recovery (Ginz- 
berg, Miner, Anderson, Ginsburg, & Herma, 
1959). There frequent references to 
theoretical formulations and specific research 
studies. Every effort was made to present the 
material in a way that would give the su- 
pervisors a feeling of being in a position of 


were 
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responsibility, faced with the necessity of 
diagnosing the reasons behind the ineffective 
performance of a subordinate and taking ap 
propriate action. Thus, those participating in 
the course were constantly reminded of their 
supervisory role throughout the sessions. In 
addition they were given information osten- 
sibly intended to improve their understanding 
of other people. Many, no doubt, as the 
course progressed, began to relate this infor- 
mation to their own performance failures. 
During the final session this latter emphasis 
was made quite explicit. The focus shifted 
from the ineffective subordinate to the in- 
effective supervisor. Here, for the first time, 
an attempt was made to explain the sources 
of anxiety inherent in the supervisory role. 
Reference was made to the physical anxiety 
so commonly experienced by top level execu- 
tives of large corporations (Miner & Culver, 
1955). This lecture took a form in many 
ways analogous to an expanded psychoana- 
lytic interpretation. 

The general nature of the material covered 
in each lecture is outlined below. 

Lecture 1. The results of various studies 
carried out by the Survey Research Center 
of the University of Michigan (Kahn, 1956; 
Kahn & Katz, 1953) dealing with the differ- 
ences between highly productive and less pro- 
ductive industrial work groups were presented. 
Particular emphasis was placed on the role 


of the supervisor in fostering effective per- 
formance. 


Lecture 2. The relationship between physi- 
cal and intellectual factors and performance 
was discussed, the latter receiving major at- 
tention. The subjects covered included motor 
dexterity, theory of intelligence, special abili- 
ties, and job placement. Studies and theoreti- 
cal formulations were drawn largely 
Vernon (1950) and Miner (1957). 

Lecture 3. A brief survey of studies dealing 
with the relationship between emotional fac- 
tors and performance was followed by a more 
detailed analysis of the Markowe (1953) and 
Miner and Anderson (1958) researches. There 
was also some discussion of various psycho- 


from 


somatic disorders based on studies presented 
in Life Stress and Bodily Disease (Associa- 
tion for Research in Nervous and Mental Dis- 
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ease, 1950), of alcoholism, and of methods of 
identifying emotional pathology. 

Lecture 4. Although some attempt was made 
to discuss the literature on motivation in a 
general way, the primary emphasis was on 
work motivation. Considerable time was de- 
voted to the subject of unconscious motiva- 
tion following generally the position taken 
by Pederson-Krag (1955). Additional mate- 
rial dealing with the law of effect was drawn 
from Haire (1956). 

Lecture 5. With this lecture the emphasis 
shifted from individual factors which might 
produce ineffective performance to group fac- 
tors. The matter of a man’s relationship to 
his family was discussed, drawing heavily on 
material presented in Gmzberg et al. (1959). 
The major framework was, however, indus- 
trial rather than military with particular at- 
tention devoted to performance while on busi- 
ness trips and foreign assignments. 

Lecture 6. This lecture covered some of the 
material discussed during the introductory 
session. However, the work group rather than 
the supervisor constituted the major focus. 
An attempt was made to show how various 
characteristics of, and relationships within, 
the group might contribute to the ineffective 
performance of a group member. Cohesion, 
informal leadership, and the social isolate 
were among the topics discussed. Much of 
the material was drawn from Brown (1954). 

Lecture 7. The primary emphasis was on 
the various ways in which organizational 
policy may affect both the performance of 
individual members and the standards against 
which they are evaluated. The subject of 
training and its relationship to performance 
was covered, as were sick leave policy, the 
handling of grievances, and layoffs. 

Lecture 8. After a rather brief discussion of 
cultural values and the manner of their trans- 
mission, an attempt was made to demonstrate 
how conflicting values may contribute to in- 
effective performance. Much of the material 
was drawn from Ginzberg et al. (1959), es- 
pecially that dealing with values placed on 
equity, freedom of choice, and morality. In 
addition there was some discussion of the 
problems that may arise when the value 
placed on individual freedom of choice by 
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training comes in con- 
demands of the industrial 


those with scientific 
flict with the 
situation. 

Lecture 9. With this lecture another major 
shift occurred. Having dealt with factors 
within the individual and factors related to 
various groups, that might have a negative 
impact on performance, the third major type 
of determinant—the  situational—remained. 
Within this category which also includes eco- 
nomic and political conditions, geographical 
location, etc., the general topic of situa- 
tionally aroused fear reactions was given pri- 
mary consideration. Subjects covered were 
subjective and objective fear reactions, types 
of situations which may arouse anxiety, and 
the psychodynamics of phobic reactions. 

Lecture 10. As indicated previously this 
final session focused explicitly on ineffective 
performance in the supervisory job. Again the 
emphasis was on situational factors, but the 
framework was further narrowed. The in- 
structor returned to the results of the Uni- 
versity of Michigan studies presented in the 
first lecture and attempted to show how the 
differences in behavior characteristic of su- 
pervisors with highly productive and _ less 
productive work groups might be attributed 
to differences in emotional response to the 
supervisory situation itself. Special note was 
made of the fact that even though a man was 
well informed as to effective methods of su- 
pervision, he might still be quite unable to 
utilize this knowledge, due to anxiety aroused 
by the supervisory situation. 

THE EVALUATION PROCEDURE 

A measure of attitude toward various aspects of 
supervisory work was developed and administered in 
the usual pre- and posttest design with a single con- 
trol group. Participation in the evaluation study was 
on a voluntary basis and as a result only 55 of the 
72 supervisors who completed the course actually 
were included. The remaining 24% failed to return 
the initial test and thus could not be included in the 
experimental group. There is no reason to believe 
that the inclusion of these additional supervisors 
would have materially altered the findings obtained. 
Nevertheless this possibility cannot be completely 
ruled out. The control group consisted of 30 super- 
visors within the Research and Development De- 
partment who did not attend the course in psychol- 
ogy and who volunteered to participate in the study 

The evaluation instrument was a specially devel- 
oped sentence completion test designed to measure 
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attitude toward a number of aspects of the super- 
visory job. The test contained 40 stems, only 35 of 
which were scored. The stems were deliberately se- 
lected to yield information on certain attitudes with- 
out at the same time revealing the purpose of the 
test. Thus, stems intended to produce answers in- 
dicative of attitudes toward authority 
cluded My family doctor . . . as well as Top man- 
agement Actually, the majority of items re- 
ferred to situations outside the work environment or 
not specifically related to the work environment 
This, in spite of tl 


figures in- 


the fact that the test was intended 
to measure attitudinal reactions characteristic of the 
man’s working situation. As far as could be deter- 
mined, the supervisors had no idea of what the test 
really measured and thus were in no position to se- 
lect their responses in such a way as to give a good 
impression. Nor could they consciously manipulate 
their answers with a view to predetermining the out- 
come of the study. In several of the groups there 
was some discussion of the evaluation instrument 
during the introductory session and a number of 
offered as to its purpose. Although 
highly imaginative, these guesses had little relation- 
ship to the true objective. 

Each individual item response was scored as posi- 
tive, neutral, or negative depending on whether or 
not there was an expression of attitude toward the 
ictivity, event, or individual specified in the stem, 
ind the direction of the attitude if one was in evi- 
dence. The 35 scorable items were selected to fall in 
seven different categories, each of which contained 
five items. The seven categories, typical items, and 
the general characteristics of responses labeled posi- 
negative are indicated below. On the aver- 
age, approximately half of the items in a record were 
scored neutral because no attitude was indicated, or, 
although this was relatively rare, because of skip- 
ping. Ambivalent responses were considered negative 

1. Attitude toward authority figures. (My family 
doctor Policemen. . . .) Positive attitude was 
indicated by any expression of liking, praise, respect, 
confidence, or a feeling that better 
Negative attitude was indi- 
cated by any expression of criticism, negative emo- 
tional reactions, or a feeling that the 
ceived are greater than deserved 

2. Attitude toward competitive (Playing 
golf . . .; When playing cards, I ) Positive atti- 
ude was indicated by any reference to attempting 
o win, interest in participation, pleasant emotional 
reactions or physical sensations, and expectation of 
Negative attitude was indicated by any ref- 
erence to a lack of interest in participation, unpleas- 
ant emotional reactions or physical sensations, ex- 
pectation of failure, and criticism of the activity 

3. Attitude toward competitive 
(Running for political office ...; Final ex- 
aminations. ...) Positive and negative attitudes 
were scored essentially in the same manner as for 
competitive games. 


guesses were 


tive or 


ipprec iation, 
treatment is deserved 


rewards re- 


games. 


success 


situations gener- 
ally. 


4. Attitude toward taking a masculine role. (Shoot- 
ing a rifle ...; Wearing a necktie. ...) Positive 
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attitude was indicated by any expression of liking, 
positive physical sensation or emotional reaction, fa- 
vorable opinion, being helped, wish to participate or 
a feeling of confidence 
well. Negative attitude was indicated by ex- 
of dislike, negative emotional reaction or 
physical sensation, unfavorable opinion, lack of in- 
terest or a wish not -to participate 
5. Attitude toward imposing 
others. (Punishing children . . .; 
men asks me for advice an 


indicated by ref 


in one’s ability to perform 
any 


pression 


one’s wishes 
When one of my 
Positive attitude was 
‘rences to positive emotional reac- 
tions, wishes for participation, liking or being inter- 
ested in the activity, favoring participation, associa 
tion of the 
that the 
others 


on 


with and evident 


actually 


activity success, any 


impose his wishes on 


Negative attitude was indicated by 


man does 
references 

par- 
and 


does not impr e 


reactions, a wish not to 
of the 


either 


to negative emotional 
ticipate, the undesirability 
that the 
others or 


activity, 
suggestion man 
wishes on characteristically fails i 
tempting to do so 

6. Attitude toward 


(Presenting a report 


tanding out from the 


group 
Making 


indicated 


at a staff meeting 

Positive attitude was 
itive emotion, expectation of 
of the activity, favorable 
ion, liking, interest, or a wish to participate 
tive attitude indicated by reference to 
failure to function effectively in the 


introductions ) 
by reference to px 
cess, the helpfulness 
Neg 
was dislike, 
situation 
tive emotional reactions, criticism of the activity 
a wish not to p 
7. Attitude toward 
(Dictating letters 
attitude 
wish to particips 
opinion, positive 
evidence that th 
activity or feels it 
was indicated by 
participate, 
tional reaction, 
performance in the situation 


participate 
' 


fun 

Posi- 
expression of a 
tions, favorable 
liking, and 
man either does participate in the 
to be helpful Negative 
any expression of a 


administrative 
; Decisions 2 ae 


routine 
ion 
indicated by any 
ite, positive em 


physical sensation 


tive was 


any 


attitude 
wish not te 
negative physical sensati rr ¢ 


criticism, or ¢ gesti f poor 


These characteristics were selected as being among 
the su- 
superviso! 


those required for effective performance in 
job. It was that a 
should have favorabl 


1uthority 


issumed 
1 reasonably 
figures, should like competitive situations 
or at least not dislike them, should prefer the mascu- 
line role, should have no difficulty in bringing 
self to impose his wishes on others, should wish 
to avoid out from the gr and should 
be willing to perform routine administrative func- 
tions. The more positive his attitude toward 
things the more effective his performance 
on the validity of 
sented after a_ brief 


cedure. 


pervisory 


attitude toward 


him- 
not 
standing Up, 
such 
Evidence 
these assumptions will be pre- 
pro- 


! discussion of the 


scoring 

The responses of a 
100 research pment supervisors (including 
the 85 who participated in the training evaluation 
study) and 20 industrial relations supervisors were 
categorized as positive, negative, or neutral. Keys 


normative group consisting of 
and devele 
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consisting of all combinations of the five items in a 
category were then constructed. There were 31 such 
keys for each characteristic taking 
the items one at a time, two at a time, three at a 
time, four at a time, and five at a time. The re- 
sponses of the normative group were then matched 
against the keys and those keys—usually those with 
a larger number of items—which yielded rare fre- 
quencies, which were met by 5% of the 
normative sample, were noted. Each key was scored 
for positive and negative responses separately so that 
there were actually 62 keys for each of the seven 
haracteristics. Any individual who gave a pattern 
of either positive or negative responses to a 
set of five items which matched one of the 
previously found to be rare in the normative 
considered possessing either a positive 
negative attitude toward that particular aspect of 
the supervisory job. This procedure is similar to 
that employed in scoring the Tomkins-Horn Picture 
Arrangement Test. The rationale on which it is 
based has been presented (Tomkins & 
Miner, 1957, 1959). 

In addition to the seven categories already noted, 
in eighth was developed based on the patterning of 
throughout the total This might 
considered a measure of attitude toward the super- 
job as a whole rather than s specific as- 
Within each of the seven categories the five 
were ranked in the of the frequency 
with which they elicited positive responses in ,the 
normative group. Five keys were then constructcd— 
for each level of rank—each seven 
ems. These keys were then combined to form keys 
of 14, 21, 28, items in the manner 
as individual items had been combined previously. 
There were thus 31 keys for positive attitudes and, 
after mut 


re sponses 


those based on 


or less 


given 
keys 
group 
or 


was 


as 


elsewhere 


responses test be 
visory 


pect 


m 


items order 


one containing 


and 35 same 


based 
the 


a similar procedure had been carried 
on the frequencies for in 
normative group, 31 keys for negative attitudes. In 
the case of these general keys the matching 
against an individual’s response pattern was carried 
to determine whether he had the key 
enough items to yield a rare. It not a matter, 
is previously, of noting whether key in 
rare in the normative All of these 
by this definition. Rather the matching 
was carried out to determine whether the individual 
had the sufficient number of times 
mark him as exceeding 95% of the normative group 
is identical to that employed with 
Tomkins-Horn Picture Test 
For the purposes of this no attempt 
made to determine how rares an 
might have obtained on keys within a given cate 
If a rare was found on at least one key that 
was enough to label his attitude as positive or nega 
An individual’s total score on the test was the 
im of his positive rares (maximum possible, 8) minus 
of his negative rares (maximum possible, 
This total score could vary from —8 through 0 
r8. 
The reliability of the scoring was checked by re- 


negative 
more 


out met on 
was 
the 


toto 


was keys 


group 
were rare 


met key a to 
This procedure 
the Arrangement 
study was 
many individual 
gory 


tive 
st 
the sum 
8) 


to 





228 


scoring 20 protocols at a four-month interval. Per- 
fect agreement on the individual item scoring—posi- 
tive, negative, or neutral—was obtained in 95.4% of 
the The error per 1.6 
Agreement on the assignment of plus, minus, or no 
On the av- 
erage there were .25 errors of this type per record. 
An estimate of the overall reliability of the in- 
strument was obtained by comparing pre- and post- 
test scores for the control group. The r obtained was 
74. This is without question an underestimate con- 
sidering the time interval inv d and the changes 
which occurred in the group during the interval 
Validity was evaluated by comparing the scores 
with management appraisal ratings obtained in the 
Research and Development Department approximately 
a year and a half prior to the initial testing. The re- 
presented in Table 1 
on job grades. The ratings 
judgment of the man’s im- 
that man’s superior, and a rep 
resentative of Industrial Relations Department 
The rating made on a 10-point 
scale and represented an overall judgment of the 
man’s job The spe- 
considered matters as 
motivation, 
stability in 
their 


cases. average record was 


attitude scores to categories was 96.9% 


sults of these comparisons are 
with those based 
represent the 
liate supervisor, 

the 


periormance 


along 
pooled 
me 


was 


present performance. judges 


cifically such 
pe tence, 
tional 

making 


technical 
intelligence, and em 
addition to leadership prior to 
ratings. For this the 
cannot be considered a pure measure of supervisory 
skill. Even if perfect validity were p they 
would not yield a high correlation with the sentence 
completion test. They contain too many components 
other than supervisory skill. The same must be said 
of the potential for advancement ratings. These, too 
contain a variety of components. However, in the 
case of the rather large number of men pres- 
ent jobs required only a minimum of 
responsibility, the weighting 
skill is probably considerably greater 
ratings than in the performance ratin 
It will be noted that in Table 1 a 
made between the total validation sample 


com 
effective 


ratings 


eason 


esent 


supervisory 


given to supervisory 
in the potential 


distinction 


of 81 case 
rABLE 1 
BETWEEN SENTENCE COMPLI 


MEASURES O1 


EFFECTIVENESS 


ONS 


Scor! AND VARIOUS 


Measure of Effectivenc . P 


Total Supervisory Group 
Rating 81 >.10 
Potential Rating 81 < 01 


Job Grade 81 


Performance 
O05 
Group with Primary 
visory Reaponsibilit 
Rating 41 31 
Potential Rating 11 31 
Job Grade 41 15 


Performance 
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rABLE 2 
MEAN PRE- AND Postrest ScoRES FOR EXPERIMENTAI 


AND CONTROL GROUPS 


Post- 
test 


Pre- 


Group test 


Mean Mean 
1.11 
33 


Experimental ; Al 3.70 


Control 2.39 


and a special subgroup of 41. This latter group con- 
tains all men with primary supervisory responsibility. 
Those excluded are 


or three subordinates at 


first line supervisors with two 
most. Much of their work 
rather than supervisory 
smaller sample provides a more ade- 
for evaluating the validity of the sen- 
completion measure as predictor of mana- 


is of necessity technical 
Probably the 
quate basis 
tence 
rial 


success 


RESULTS 


The pre- and posttest comparisons (Table 2) 
reveal a significant rise in the score obtained 
by the experimental group and a significant 
decline for the controls. Whereas there was 
no reliable difference between the groups at 


the time of the initial testing, a clear differ- 
entiation had emerged by the time of the 
second test (¢ = 3.04, P < .01). Within the ex- 
perimental group 64% increased their scores, 
14% remained the same, and 22% gave evi- 
dence of a less favorable attitude toward the 
supervisory job. Clearly the course did not 
produce a positive effect in all cases. Never- 
theless, 609% of the controls declined in score 
on the second test and another 20% remained 
the same. Only 20% showed any increase 
at all. 

If the total scores for the control group are 
broken down into their positive and negative 
components, it becomes apparent (Table 3) 
that the decline in is attributable to 
a shift in negative attitude only. The num- 
ber of aspects of the supervisory job eliciting , 
a positive response remained essentially the 
same from pre- to posttest. There was, how- 
ever, a significant increase in the number of 
indicative of a negative attitude. A 
check on the eight individual categories re- 
veals, as might be expected, no reliable sh.‘ 
in the frequencies for positive attitudes. On 
the negative side, however, one reliable differ- 
ence was found. Rares on Negative Attitude 


score 


rares 
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TABLE 3 


MEAN NUMBER OF 
RESPONSES GIVEN BY CONTROI 


NEGATIVE RaArI 


GROUP ON 


POSITIVE AND 


PRE- AND POSTTESTS 


Rare 


Response Pretest Posttest 


Mean Mean 
1.30 1.03 
1.00 1.37 


Positive 
Negative 


toward Competition Generally increased from 
7% to 30% in the control group (z = 2.27, 
P < .02 for a one-tailed test). Apparently the 
decline in total score for the control group is 
primarily attributable to an increase in nega- 
tive attitudes toward competitive activity. 
The experimental group, however, was rela- 
tively unaffected if this respect. The experi- 
mental percentage for the posttest, 13, was 
almost exactly the same as that for the pre- 
test. Whereas experimental and controls did 
not differ at the time of the initial testing, 
the increase in negative attitudes toward com- 
petition in the control group did produce a 
significant difference on retest (z = 1.65, P 
< .05 for a one-tailed test). These findings 
suggest that the training acted in such a way 
as to ward off or minimize those factors op- 
erating to produce an increase in negative 
attitudes toward competitive activity. 

Table 4 presents the results of a further 
analysis carried out for the experimental 
group. It indicates that the change resulting 
from the course was almost entirely a matter 
of positive attitudes. With the exception of 
the preventative effect in the area of com- 
petitive activity previously noted, no change 


TABLE 4 
MEAN NUMBER OF POSITIVE AND NEGATIVE RArI 
RESPONSES GIVEN BY EXPERIMENTAL GROUP ON 
PRE- AND POSTTESTS 


Rare 


Response Pretest Posttest 


Mean Mean 
Positive 1.04 1.91 
Negative .92 80 
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in negative attitude frequencies occurred in 
the experimental group. Apparently the course 
had very little consistent impact on existing 
negative attitudes toward various aspects of 
supervisory work. At best it rearranged some 
negative attitudes, producing a shift from dis- 
like of one aspect to dislike of another. 

A check on the eight categories reveals that 
within the experimental group three of them 
increased significantly in frequency from pre- 
to posttest. These shifts in positive attitude 
are presented in Table 5. The P values are 
for a one-tailed test. The other five categories 
also increased in a similar manner, but not 
sufficiently to yield reliable differences. The 
findings do indicate, however, that clear-cut 
increases in positive attitude toward impos- 
ing one’s wishes on others, toward routine 
administrative functions, and toward the su- 
pervisory job generally were produced. Pre- 
sumably the course acted to arouse positive 
attitudes in these areas where no attitude, 
either positive or negative, had existed previ- 
ously. 

A series of analyses was also carried out in 
an effort to determine what factors might be 
associated with the changes occurring in the 
experimental group. These analyses were, 
however, consistently unrevealing. The posi- 
tive shift in attitude toward the supervisory 
job was equally marked among those who, at 
the time they started the course, gave evi- 
dence of a rather favorable attitude toward 
supervision and those who were originally 
somewhat negative. Similarly, job grade, per- 
formance ratings, potential for advancement 


rABLE 5 


EXPERIMENTAL Group GIVING 
PosiTIVE RARE RESPONSES IN VARIOUS 


PERCENTAGE Ol 


CATEGORIES ON PRE- AND POSTTESTS 
Pre- Post- 
test test 
Per- Per- 

centage centage 


Category 


. Imposing One’s 11 24 
Wishes 

. Routine Adminis 13 
trative Functions 


3. Supervisory Job 18 
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ratings, and intelligence (Thurstone Test of 
Mental Alertness) all produced negative re- 
sults. 


DISCUSSION 


Although this study was originally designed 
to yield information regarding the impact of 
a course in psychology on attitudes toward 
supervisory work, it also appears to provide 
considerable insight into the effects of or- 
ganizational changc. Shortly after the course 
began, rumors of a shift in the organizational 
structure of the Research and Development 
Department began to spread at a rather ac- 
celerated rate. These rumors had persisted 
over a period of several years, but without 
being given a great deal of credence. Now it 
became quite apparent that some change was 
inevitable. There was, however, no authorita- 
tive information available as to the exact na- 
ture of the reorganization, nor was there any 
clear-cut basis for predicting what personnel 
changes would be made. This state of uncer- 
tainty persisted until the end of the course. 
In fact it was not finally alleviated until a 
number of months later when several func- 
tions were shifted to another department. 

It is, of course, possible that the increase 
in negative attitudes toward supervision found 
in the control group was totally unrelated 
to the prospect of organizational change. It 
seems probable, however, that the two were 
closely associated. One might hypothesize 
that a number of supervisors faced with a 
very uncertain future, sought a degree of se- 
curity in increased allegiance to their present 
group. If such a banding together to ward off 
changes which might be introduced from the 
outside did occur, a reduction in the degree 
of emphasis on individual competition is not 
surprising. In fact the supervisors might well 
feel that individual competition was undesir- 
able, since it would tend to break up the 
group’s solidarity and make individual mem- 
bers more vulnerable. This is, of course, ex- 
planation after the fact. We do not know 
with certainty why competitive activity be- 
came so distasteful. Nevertheless, an explana- 
tion in terms of increased group solidarity in 
the face of external threat is not inconsistent 


with what we know of group dynamics. 


John B. 


Miner 


Although the shift in control group scores 
provided the most unexpc ~ted outcome of the 
study, the persistence of negative attitudes 
within the experimental group was far from 
anticipated. The course was deliberately de- 
signed to build up a readiness to accept the 
interpretations presented in the last lecture. 
It was hoped that those supervisors who had 
developed negative attitudes toward certain 
aspects of supervisory work because of situa- 
tionally aroused anxiety would gain insight 
into the nature of their emotional reactions 
and consequently overcome their negative 
attitudes. This clearly did not occur on a 
widespread basis. At best some of the super- 
visors were protected against certain sources 
of insecurity which under other circumstances 
would presumably have resulted in the de- 
velopment of even more negative attitudes to- 
ward supervisory work. Certainly the results 
of the present study cannot be taken as pro- 
viding evidence for the efficacy of psychoana- 
lytic interpretations in a lecture situation. 

On the other hand there is considerable 
evidence that positive attitudes toward the 
supervisory job were aroused. Many who 
were previously neutral as regards certain 
aspects of their work developed more favor- 
able viewpoints. In all probability this was a 
result of the way in which the men were 
treated. They were told again and again that 
they were supervisors and responsible for the 
performance of their subordinates. They were 
given some insight into the complexity that 
supervisory work could assume and into the 
difficulties associated with diagnosing the 
causal factors behind a given instance of in- 
effective performance. Many, apparently, be- 
gan to view such things as getting others to 
do what they wanted and carrying out rou- 
tine administrative functions with a new re- 
spect. It seems probable that this result is 
largely attributable to the specific type of 
lecture approach employed. Research studies 
and theoretical formulations which should be 
of value to those in supervisory jobs were 
emphasized. The material was specifically de- 
signed to appeal to the men in their super- 
visory role. 

Although there is little question that atti- 
tude change was in fact produced by some as- 
pect of the course, the question of perma- 
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nency remains. Ever since the International 
Harvester studies (Fleishman, Harris, & 
Burtt, 1955) it has been clear that attitude 
changes produced by a training program may 
well disintegrate under the impact of contra- 
dictory attitudes existing in the work place. 
The possibility that the changes produced as 
a result of the present course were similarly 
short-lived cannot be ruled out. There are, 
however, two arguments suggesting that the 
favorable attitudes toward supervisory work 
had some permanency. 

First, there is the manner in which the 
course was conducted. Contrary to the Inter- 
national Harvester situation, the men were 
not removed from their jobs and subjected to 
a period of concentrated training in a new 
location. The course was conducted in a con- 
ference room located in the same building as 
the offices where many of the men worked. 
Laboratories and pilot plant units were all 
nearby. The men left their work to attend 
the sessions and returned to their jobs be- 
tween one and a half and two hours later. 
Any attitude changes induced as a result of 
a given session had to meet the test of “job 
realities’ for the ensuing week before the 
change could again be reinforced through 
training. 

Secondly, due to certain exigencies of the 
industrial situation, it was not possible to ad- 
minister the posttest questionnaire to all of 
the men immediately upon completion of the 
course. Within both experimental and con- 
trol groups there was often considerable de- 
lay before the tests were filled out. Thirty- 
three of the experimental Ss did complete the 
questionnaire within two weeks of the time 
they finished the course, but the remaining 
22 required from two to six weeks. About 
half of this latter group did not finish the 
test until after a month had elapsed. The 
mean change in sentence completion test score 
from pretest to posttest was very nearly the 
same in both groups, .94 for the early re- 
spondents and 1.09 for the later ones. There 
was certainly no evidence of a return to prior 
attitudes within this relatively brief period 
following the completion of the course. What 
happened six months or a year later we can- 
not say. The reorganization had occurred and 
further study was not feasible. 


SUMMARY 


A group of engineers and chemists em- 
ployed as supervisors in the Research and 
Development Department of a large com- 
pany were given a course in psychology as 
part of the company’s management develop- 
ment program. The major objective in con- 
ducting the course was to foster a more fa- 
vorable attitude toward the supervisory role. 
The extent to which this objective was ac- 
complished was determined through the use 
of a sentence completion test designed to 
measure attitude toward certain aspects of 
supervisory work and scored in accordance 
with the rationale of the Tomkins-Horn Pic- 
ture Arrangement Test. 

Comparison of pre- and posttest scores in- 
dicated that the training which, with its em- 
phasis on the lecture method and research 
findings was very similar to many academic 
courses in psychology, was effective in induc- 
ing a more favorable attitude toward super- 
visory work. In addition, certain changes oc- 
curred in the control group scores, apparently 
in response to the threat of reorganization, 
which were indicative of an increase in nega- 
tive feelings toward supervisory work, espe- 
cially insofar as this work involved individual 
competitive effort. 
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